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Add-ins for Office 2007 


-Tuler Chessman 

Data Mining Add-ins for Office 2007 help you develop BI 
by using the power of SQL Server 2005 and Microsoft Office 
Excel 2007 to analyze your organization’s data. 
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l.CutbMyTable1] Discover why SQL Server’s built-in dependency tracking functionality 
doesn’t work well and how the dependency checker avoids the same 
INTO [dbol.[Cu problems. 
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A Practical Approach to Database Mirroring 


-Sean McCown 

Looking for high availability? You can implement database mirroring as 
an alternative to expensive third-party solutions. Here’s how. 
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SOL Server Integration Services 


-Rodney Landrum 

Create an SSIS package that connects to all your SQL Server machines 

and collects version, configuration, existing databases, and scheduled jobs 
information into a central repository and then generates a handy report 

for your staff. 
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SOL Server Consolidation 
-Allan Hirt 


Learn what steps to take to assess your SQL Server environment and 


determine whether planning a workload consolidation makes sense. 
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DATETIME Calculations, Part 4 

—ltzik Ben-Gan 

This fourth part of a series about datetime challenges 
and calculations focuses on techniques for calculating 
the date of the first and last occurrences of a weekday 
in a month, based on a given event date. 
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Consolidating Databases 

—Michelle A. Poolet 

SQL Server sprawl is becoming an increasing headache 
for DBAs everywhere. But database consolidation’s no 
picnic either. A centralized structure will ensure you 
keep your consolidation as simple as possible. 
InstantDoc ID 95405 


SOLUTIONS BY DESIGN. . 27 


www.sqimag.com 


May 2007 


Vol. 9, No. 5 


departments 


EDITORIAL. ... TT 3 


Licensing Answers 
—Michael Otey 
InstantDoc ID 95622 


READER TO READER..... 5 


READER CHALLENGE ....7 


InstantDoc ID 95501 


MARKETPLACE ........ 46 
SELECT TOP(X) ........ 48 
Virtualizing SQL Server 

—Michael Otey 


InstantDoc ID 95349 


DILBERT '6 a0 40 4G w awn 48 


products 


REVIEW ose cone a cemaena eats 40 


SonaSafe for SQL Server 

—John Green 

Simplify your SQL Server backup and recovery efforts 
with this disk-to-disk backup solution. 
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—Jason Picker 
This BI tool offers a rich set of visualizations to provide 


insight into your data. 
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Check out new and improved SQL Server—related 
products. 
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Sneak a peek at XMLSpy® 2007, 
and see how vital it is to master XML. 


Revealed in XMLSpy 2007: 
e Completely redesigned graphical WSDL editor 


e New XPath Analyzer window that stores multiple 
expressions and results 


e Advanced find-in-files search and replace capability 


Altova® XMLSpy, the industry standard XML development 
environment, is indispensable for modeling, editing, trans- 
forming, and debugging XML-related technologies. Illuminate 

your strategy with the world's leading XML editor, the original 
graphical schema designer, a code generator, file converters, 
debuggers, profilers, support for XSLT, XQuery, WSDL, SOAP, 
and a wealth of brilliant XML utilities and enlightened 
usability aides. Become a markup mastermind! 
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ome of the most exciting developments 

in IT technology today are in processors 
and virtualization. Processor technology has 
been advancing faster than ever before. The 
advent of 64-bit x64 technology has been 
followed quickly by dual-core CPUs, quad- 
core CPUs, and integrated virtualization 
capabilities at the CPU level. Virtualization 
has also been rapidly evolving. Not so long 
ago, all of the major virtualization products 
were high-priced offerings from Microsoft 
and VMware. Now, both of Microsoft’s vir- 
tualization products, Virtual PC 2007 and 
Virtual Server 2005 R2, are free. 

With the rapidly growing adoption of 
advanced processor and virtualization tech- 
nologies, chances are that your business is 
either already using or is preparing to use 
one or both of these technologies. As you 
prepare to use either multicore CPUs or 
virtualization, you must be aware of the 
licensing ramifications for the OSs that 
support them and applications such as SQL 
Server 2005. 

Licensing can be confusing for SQL 
Server users because SQL Server is licensed 
per CPU. Both multicore processors and 
virtualization affect the number of proces- 
sors, albeit in different ways. With multicore 
technology, Microsoft’s position is that you 
need to license according to socket, not 
according to the number of processing cores. 
In other words, if you’ve just purchased 
a new quad-core server that has a single 
quad-core CPU, then you'd buy one license 
for one CPU, not four. (The CPU might 
have four processing cores, but there’s just 
one socket on the motherboard. So for SQL 
Server, youd license a single CPU.) Similarly, 
if you purchase a dual-processor quad-core 
server that has two quad-core processors, 
you'd need a two-processor license. 

Virtualization raises similar questions. 
What are the licensing requirements if you 
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buy a powerful 8-way server that’s intended 
for server consolidation, then you install 
four guest virtual machines (VMs), each 
running a different SQL Server instance? 
Do you need to license the SQL Server 
instances as having eight CPUs? The 
answer is no. The SQL Server instances that 
run in the guestVMs are licensed according 
to the number of virtual processors that are 
configured in the VM, not the number of 
CPUs that are available in the host. There- 
fore, if you configure each of your four 
guest VMs with a single virtual CPU, they’d 
each use a single CPU license. If one of the 
VMs needed additional processing power 
and you configured it to use two virtual 
CPUs, the SQL Server instance running on 
that VM would need to be licensed for two 
CPUs. The important thing to remember is 
that when running on aVM, SQL Server is 
licensed according to the CPUs in the VM. 
It doesn’t matter how many CPUs are in 
the physical host. 

With SP2, SQL Server 2005 Enterprise 
Edition now offers unlimited licensing of 
virtual SQL Server 2005 instances. And 
Microsofts recent virtualization licensing 
changes enable Windows Server 2003 
R2 Enterprise Edition to run four active 
virtual Windows Instances and Windows 
Server 2003 R2 Datacenter, which allows 
an unlimited number of active Windows 
instances. These changes could be beneficial 
and cost-effective for customers who are 
considering virtualization technology. If 
you have any questions about SQL Server 
or virtualization, be sure to visit our forums 


at hhttp://forums.windowsitpro.com a 


forum/default.aspx 
InstantDoc ID 95622 
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ApexSQL Diff Database comparison and synchronization 
ApexSQL Doc Database Documentation into HTML and CHM 
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ApexSQL Script Database scripting, packaging and deployment 
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Leverage Clustered Indexes to 
Avoid Bookmark Lookups 

Many DBAs and T-SQL developers use 
nonclustered indexes to avoid table scans, 
thereby improving query performance. 
Frequently, this type of performance tuning 
works well. However, with large tables, 
costly bookmark lookups often occur as 
a result. I devised a technique to improve 
the performance of queries that run slow 
because of bookmark lookups. This tech- 
nique also demonstrates that DBAs and 
T-SQL developers often know more than 
the query optimizer. Unlike the query 
optimizer, they can make assumptions based 
on their knowledge of the data and devise 
creative solutions. 

Before I tell you about the technique, 
let me tell you a bit about the problem. 
My client’s data warehouse environment 
contained several very large and very wide 
tables. The clustered indexes for the tables 


Searching the Web for SQL 
Topics? Give SQL Hunter a Try 

I created a SQL-specific search engine called 
SQL Hunter (http://sqlhunter.com), which 
gives priority search results to more than 45 
SQL-related Web sites—and I’m adding more 
Web sites every day. SQL Hunter uses a tab 
layout to refine searches. For example, if you 
search on T-SQL, then click the Blog tab, 
you'll get a list of T-SQL blogs. I built SQL 
Hunter using Google’s custom search engine 
and Google’s AJAX Search API. 

SQL Hunter can save you a lot of time 
when searching for SQL material because you 
don’t have to provide advanced search param- 
eters like you do when you use Google or 
other search engines. I hope you enjoy it! 

InstantDoc ID 95457. 
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TABLE | The Execution Plan Before the Changes 


were based on an identity column. The Row Subtree 
client chose to use the identity column to Rows Physical Operation | Logical Operation CPU Size 

. 6153 NULL ULL NULL NULL NULL 6,751.66 
improve batch performance and reduce 

6 tation durine INSERT operation. 6153 Table Insert Insert 0.02 0.04 15 6,751.66 
TASINCMAUON GUNDE operations. | IRE Top ap 0.00 251 6,751.61 


In addition, because the tables contained 
multiple types of date-time columns (e.g., 
Julian, local DateTime, Coordinated Uni- 
versal Time—UTC) to meet specific needs, 
there wasn’t an obviously suitable candidate 
to use to replace the identity columns as the 
range-based clustered key. 

As you would expect, the client created 


6153 Compute Scalar 
Hash Match 
Table Scan able Scan 
Hash Match Inner Join 0.00 


Table Scan 


Compute Scalar 6,751.60 
6,751.60 
0.01 24 0.19 

14.88 245 6,749.78 


0.04 


nner Join 


6406 
6153 


able Scan 


Index Seek Index Seek 


nonclustered keys for columns, which were 
typically found by using WHERE clauses 
that included a join condition. Almost every 
query used the following type of filter to 
return data from the large tables: 


WHERE Location = 
AND some_date_time 
BETWEEN adstart and dend 


a@location 


Nonclustered indexes were created for the 
(some_date_time, location) columns. This 
was optimal because the selectivity of the 
location column was low with respect to 
the size of the table. 

In this case, the client’s use of indexes and 
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T-SQL code to leverage the indexes, avoid 
table scans, and improve query performance 
was by the book. However, the client still 
was experiencing performance problems 
with some of the crucial stored procedures. 
Specifically, several of the data-load stored 
procedures took 90 minutes to run, which, 
in turn, was affecting the timeliness of report 
generation. 

After looking at the execution plan for 
these queries, I was able to determine the 
problem.As Table 1 shows, table scans weren't 
occurring and index seeks were occurring 
on the nonclustered indexes as planned. 
However, the nonclustered index seeks were 
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Clustered Indexes 


Kent Mitchell 


followed by bookmark lookups, which were 

drastically impacting the I/O and subtree 

cost, thus slowing down queries. 

I looked at several books and online 
articles and discovered three ways to avoid 
bookmark lookups: 

e Create fully covering nonclustered 
indexes. 

e Create many separate, small (i.e., non- 
wide) nonclustered indexes so that the 
query optimizer can use index joins. 

e Change the clustered index key. 


In this case, creating covering nonclustered 
indexes and creating many small indexes 
weren't viable solutions because numerous 
queries were returning many different col- 
umns. For example, one query might return 
12 columns and another query might return 
15 columns, nine of which differ from the 
columns returned by the first query. 


LISTING | Code That Retrieves the 
Clustered Identity Key’s MAX and MIN 
Values 


SELECT 
@amax_value = MAX(identity_column), 


@min_value = 
FROM Large_table 
WHERE 

location = dlocation 

AND some_date_time 

BETWEEN astart AND dend 


MINCidentity_column) 


6 May 2007 


TABLE 2 The Execution Plan After the Changes 
Row Subtree 

Rows Physical Operation | Logical Operation 1/0 CPU Size Cost 
6153 NULL NULL NULL NULL NULL 121.75 
6153 Table Insert Insert 0.02 0.00 15 121.75 
6153 Top Top 0.00 0.00 251 121.73 
6153 Compute Scalar Compute Scalar 0.00 0.00 251 12173 
6153 Parallelism Gather Streams 0.00 0.10 245 121.73 
6153 Hash Match Inner Join 0.00 0.11 245 121.63 
141 Bitmap Bitmap Create 0.00 0.03 193 0.07 
141 Parallelism Distribute Streams 0.00 0.03 193 0.07 
141 Table Scan Table Scan 0.04 0.00 193 0.04 
6153 Parallelism Repartition Streams 0.00 0.32 13 121.45 
90648 Hash Match Inner Join 0.00 0.40 713 121.14 
6406 Bitmap Bitmap Create 0.00 0.05 24 0.23 
6406 Parallelism Repartition Streams 0.00 0.05 24 0.23 
6406 Table Scan Table Scan 0.18 0.00 24 0.19 
90648 Parallelism Repartition Streams 0.00 1.82 190 120.50 


Because the client’s stored procedures 
used many different date-time columns 
to meet specific needs and because of the 
concern with fragmentation, changing the 
clustered key wasn’t a palatable option to 
the client. Had the stored procedures used 
the same date-time column for every query, 
this would have been a possible solution. 

So, I decided to creatively leverage the 
clustered index. In the client’s environment, 
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LISTING 2 Sample Rewritten WHERE 
Clause 


WHERE Location = adlocation 
AND some_date_time 
BETWEEN astart AND dend 
AND identity_coumn 
BETWEEN @max_value AND dmin_value 


the data was inserted into large, source 
tables per date (some_date_time) and loca- 
tion (location). Because of this setup, the 
data being extracted from these large tables 
physically exists on the disk in relatively con- 
tiguous chunks per date range. However, the 
query optimizer didn’t know this, so it used 
bookmark lookups to retrieve the actual data 
for the queries. To circumvent the bookmark 
lookups, I took the following steps: 

1. I rewrote part of the stored procedures’ 
code so that they retrieved the clustered 
identity key’s MAX and MIN values before 
they ran queries against large tables. As 
Listing 1 shows, these values are stored in 
the @max_value and @min_value variables. 
The identity column is clustered, so this type 
of query to retrieve the clustered key’s MAX 
and MIN values per the (some_date_time, 
location) columns is fully covered by the 
nonclustered indexes. As you might know, 
this is possible because the clustered key 
values are implicitly stored within the non- 
clustered indexes. 

2.1 added the MAX and MIN values in 
the @max_value and @min_value variables 
to the WHERE clauses. Listing 2 shows an 
example of a rewritten WHERE clause. 


These changes let the query optimizer 
perform a clustered index seek directly 
against the clustered index key that’s based 
on the identity column. As Table 2 shows, 
the costly bookmark lookups have been 
eliminated. Another positive consequence of 
the changes is the addition of parallelism. 

This simple solution addressed the major 
cause of the performance problems. The time 
it takes the data-load stored procedures to 
run has decreased from 90 minutes to only 3 
minutes (or less)—a 97 percent improvement 
in performance. This simple solution also 
proves that DBAs and T-SQL developers can 
often come up with clever solutions because 
they know their data well. SOL 

InstantDoc ID 95459. 
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Check for Objects 
Created Without SET 
Options Enabled 


H ere’s the solution to the April Reader 
Challenge. (To read the full Challenge, 
“Change the Status of Table Items,’ go to 
InstantDoc ID_95324.) 


Solution: 
Siva should first create an index on the 
OrderStatus column of the Orders table. 
Creating this index will help him quickly 
locate orders of a particular status. Addition- 
ally, he should include a column such as 
OrderDate as part of the index to answer 
more useful questions about date and status 
of the orders. Siva can create this index by 
using the following CREATE INDEX 
statement: 
CREATE INDEX idx_Order_Status_By_Date ON 
Orders("OrderStatus", "OrderDate") 
Next, Siva should use the UPDATE state- 
ment in Web Listing 1 (http://www.sqlmag 
.com, InstantDoc ID_95501) to change the 
status of orders from Processing to Shipped 
after all of the line items have been processed. 
The UPDATE statement selects all orders in 
the Orders table that are in the “Processing” 
status and ensures that all of the line items 
for every order are in the “Processed” status. 
The index on the OrderStatus and Order- 
Date columns of the Orders table improves 
the UPDATE statement’s performance. To 
test the UPDATE statement action, Siva can 
use the code in Web Listing 2. 


MAY CHALLENGE: 


Test your SQL Server savvy in this month’s 
Reader Challenge. Submit your solution 


in an email message to_challenge@sqlmag 


www.sqimag.com 


-com by May 10. Umachandar Jayachandran, 

a SQL Server Magazine technical editor, will 
evaluate the responses. We’ll announce the 
winner in an upcoming SQL Server Maga- 
zine UPDATE. The first-place winner will 
receive $100, and the second-place winner 
will receive $50. 


Problem: 

Richard is a database architect for a com- 
pany that uses SQL Server as its database. 
The company’ DBAs typically execute 
database updates by using T-SQL scripts. 
The front-end application that interacts 
with SQL Server requires several ANSI set- 
tings to be enabled when the DBAs create 
database objects such as tables, functions, 
stored procedures, and triggers. The applica- 
tion also uses features such as indexed views 
and indexes on computed columns. For the 
application to work correctly, it requires the 
following ANSI SQL SET options: 

1. ANSL PADDING should be enabled on 
all character and binary columns in tables. 
2. ANSI_NULLS and QUOTED_IDEN- 
TIFIER options should be enabled when 
stored procedures, functions, and triggers are 
created. 


Richard has recently encountered pro- 
duction problems that happen when DBAs 
execute scripts from a session that don’t have 
the required ANSI SET options enabled. 
The scripts create objects that have non- 
default settings, resulting in application 
errors and data integrity errors. To prevent 
future problems, Richard wants to include 


SQL Server Magazine 


Reader Challenge 


checks in the application and in the scripts 
that can identify objects that dont have 
the necessary SET options enabled. Help 
Richard do the following: 

1. Write code to list the columns that are 
created with ANSI_PADDING OFF 

2. Write code to list the stored proce- 
dures, functions, and triggers that were 
created or modified with ANSL.NULLS or 
QUOTED_IDENTIFIER OFF For each 
object, the setting that’s disabled should be 
shown. 

3. Create the checks in a T-SQL batch that 
can be run at the end of the script or in the 
application. The checks should also raise an 
error if any objects were found that meet the 
criteria. 


You can use the script in Web Listing 3 to 
create in the tempdb database objects that 
have some of the SET options disabled. 
SOL 

InstantDoc ID 95501 
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Umachandar Jayachandran umachandare 
yahoo.com) works on the SQL Performance Engineering Team at 
Microsoft. He specializes in online transaction processing and data 
warehousing, and he is a contributing author of SQL Unleashed, 
2nd Edition (SAMS). You can find his contributions at http://blogs 
„msdn.com/saltips and_http://blogs.msdn.com/sqlperf. 
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any analytical tools integrate with Microsoft Excel, and for a simple reason: People 

who crunch numbers for a living spend a lot of time with spreadsheets. SQL 

Server 2005’ data-mining algorithms offer powerful capabilities for analyzing data, but 

Microsoft didn’t provide any out-of-the box tools for delivering data mining to the 

desktop until recently. In February 2007, Microsoft delivered the Data Mining Add-ins 

Fe rret O ut Use ful for Office 2007, which let you take advantage of SQL Server 2005 predictive analytics 
. H in Excel 2007 and Microsoft Office Visio 2007. Let’s walk through the installation and 
In fo (ma ti on configuration of the Data Mining Add-ins and look at a comprehensive sample. We'll 


h i g d en in y our focus on one add-in tool in particular: the Data Mining Client for Excel. 


Excel data Installation and Initial Configuration 

Before installing the add-ins, make sure Excel 2007 and the Microsoft .NET Frame- 
work 2.0 are installed on your machine. Additionally, run Office Setup from the Con- 
trol Panel Add/Remove Programs applet to ensure that the .NET Programmability 
Support component is installed under Excel. (This component is included with the 
default installation of Excel.) Lastly, you'll need access to an instance of SQL Server 
2005 Analysis Services, though this instance doesn’t have to reside on your machine. I 
recommend that you also install the Adventure Works Analysis Services database sample. 
(The Microsoft article “Running Setup to Install Adventure Works Sample Databases 
and Samples,’ http://msdn2.microsoft.com/en-us/library/ms143804.aspx, explains 


how to install the sample.) 


by Tyler Chessman 
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Next, download and install the Data 
Mining Add-ins. (For download details, 
go to http://go.microsoft.com/fwlink/ 
?LinkId=82754.) In the Feature Selection 
window, which you'll see after you begin 
the installation process on your local 
machine, make sure the following features, 
at a minimum, are set to be installed on 
the local hard drive: 

e Data Mining Client for Excel, which 
lets you run through the full data- 
mining process within Excel 2007 

e Server Configuration Utility, which 
lets you set Analysis Services server 
properties specific to data mining 
and create a database for storing 
models. 


After the installation is complete, 
click Start, All Programs, SQL Server 
2005 DM Add-Ins, Server Configuration 
Utility. The configuration utility sets the 
Analysis Services Data Mining\Allow- 
SessionMiningModels server property 
to True, then creates the DMAddinsDB 
database to use for creating temporary 
and permanent data models. You can also 
view the documentation that comes with 
the Data Mining Add-ins if you prefer to 
configure your server and create a data- 
base manually. 
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A Data-Mining Primer 

At a high level, data mining is the process 
of finding information (e.g., patterns, 
trends) in large volumes of data in an 
automated or semi-automated manner. 
The classic example of the value of data 
mining is its use in cross selling. For 
example, when I buy a book online, 
the vendor might query its database for 
customers who bought the same book 
and display other titles those customers 
purchased. 

SQL Server 2005 ships with nine 
data-mining algorithms, each of which is 
suited to a different type of problem, such 
as forecasting sales, identifying fraudulent 
transactions and claims, or segmenting 
customers into different profiles. Each 
situation involves the use of historical data 
to build a model to predict a future state 
or to better understand the current state. 
Data-mining literature (including SQL 
Server Books Online—BOL) typically 
groups algorithms into different types (or 
tasks). An example of a data-mining type 
is Classification, which categorizes his- 
torical data around a predictable attribute. 
The Microsoft Decision Trees algorithm 
is a classification algorithm. 

So, data mining helps solve problems, 
but how exactly does one mine data? 
Using BOL as a reference, let’s look at the 
data-mining process in terms of six basic 
steps: 

1. Define the business problem. 

2. Prepare the historical data. 

3. Explore/validate the historical data. 
4. Build the data~mining model. 

5. Explore/validate the model. 

6. Deploy and update model. 
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The example I use is similar to Lesson 2: 
Building a Targeted Mailing Scenario in the 
data-mining tutorial that ships with SQL 
Server 2005. (For more information about 
the tutorial, see SQL Server 2005 BOL 
“Data Mining Tutorial,” http://msdn2 
smicrosoft.com/en-us/library/ms167167 
aspx.) Assuming you've installed the 
AdventureWorks Analysis Services database 
sample, you can review the BOL data- 
mining tutorial and the accompanying 
data-mining models for additional insight. 


A Data-Mining Scenario 

A typical data-muning scenario you might 
encounter 1s that of classifying customers so 
you can build a targeted mailing list. Let’s 
walk through this scenario using the BOL 
six-step data-mining process as a guide. 

Define the business problem. Let’s say 
you're working in the marketing depart- 
ment of a famous bicycle company, 
Adventure Works. The marketing director 
has decided to run a promotion for a new 
line of bicycles. You've acquired a long list 
of potential customers that includes key 
demographic data such as age, gender, 
marital status, and occupation. However, 
the marketing budget is tight. You need to 
minimize the cost of a direct mail cam- 
paign by identifying the set of potential 
customers most likely to purchase a bike, 
as indicated by customer purchase history. 
To identify these customers, you'll use the 
Data Mining Client for Excel. 

Prepare the historical data. In this 
example, the task of preparing historical 
data has already been done for you. Click 
Start, All Programs, Microsoft SQL Server 
2005 DM Add-ins, Sample Excel Data, 
to open the sample workbook named 
DMAddins_SampleData.xlsx. (You might 
want to make a backup copy of this work- 
book, which is located by default in the 
C:\Program Files\Microsoft SQL Server 
2005 DM Add-Ins folder.) Navigate to 
the Source Data worksheet to view the 
demographic data about the existing 
customers, including a field indicating 
whether they’ve purchased a bike. 
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Explore and validate the historical data. 
To explore the historical data, you'll need to 
understand the various data attributes, group 
data into buckets to reduce complexity, look 
for outliers (i.e., problematic data values that 
are far outside the expected range and that 
might adversely affect or skew a model), and 
potentially change the data. Click the Data 
Mining tab at the top of the Excel Ribbon. 
Click Explore Data to open the Explore 
Data Wizard. 

Click Next and make sure the ‘Source 
Data’! Source Data’ Table is selected. In 
the Select column drop-down menu, select 
Yearly Income. Click Next, and you'll see 
an Explore Data chart similar to the one in 
Figure 1, which distributes incomes across 
eight buckets. (You can reconfigure the 
number of buckets as needed.) By exploring 
data in this manner, you can find and, if nec- 
essary, change or delete outliers that might 
skew the data model. Go ahead and cancel 
out of the wizard. 

Let’s assume you dont want to consider 
yearly incomes above $150,000. In the Data 
Mining tab of the Ribbon, click Clean Data, 
then select Outliers to bring up the Outliers 
Wizard. Click Next, make sure the ‘Source 
Data’! Source Data’ Table is selected, and 
click Next again. In the Select column drop- 
down menu, select Yearly Income, then click 
Next. In the Specify Thresholds step, change 
the Maximum value to 150000. Click Next, 
and in the Outlier Handling step, select 


Delete rows containing outliers. Click Next 
one more time, then select Copy sheet data 
with changes to a new worksheet. A worksheet 
named Clean Data will automatically be 
created. Note that you're creating this work- 
sheet just to get an idea of how it’s done, so 
you can delete it whenever you wish. 
Within the Clean Data feature, the Re- 
label option can help you clarify or distin- 
guish data attributes. Click Clean Data and 
select Re-label. In our example, the possible 
values for the Home Owner and BikeBuyer 
columns are Yes and No. If it better suits 
your needs, you can use Re-label to change 
the Home Owner values to something 
more descriptive, such as Rent or Own. 
Build the data-mining model. Before you 
build the data model, you'll need to parti- 
tion the historical data into two buckets: one 
that you'll use to build the actual model and 
another to test the model’s accuracy. In the 
DMAddins_SampleData workbook, you 
can see the worksheets titled Training Data 
and Testing Data, which are partitions of 
the original Source Data worksheet. You'll 
use these pre-partitioned worksheets in a 
moment. But first, to get an idea of how 
this partitioning was done, you're going to 
create your own partitions on your own 
temporary worksheets, which we'll label 
Temp TrainingData and TempTestingData. 
You won't use these partitions to build the 
actual model, because partitioning is random 
and each reader’s model would be different, 
so you can delete these tem- 
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porary worksheets when 
youre done creating them. 

To partition the his- 
torical data, click the Data 
Mining tab of the Ribbon, 
click Partition Data, then 
click Next. Make sure the 
‘Source Data’! Source Data’ 
Table is selected and click 
Next again. In the Select 
Sampling Type step, stick 
with the default Split data 
into training and testing sets 
option, click Next, and keep 
the default value of 70.0 
for the Percentage of training 
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170000 


> > FIGURE | Explore Data chart 
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data field. Click Next again. 
Name the temporary 
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training worksheet TempTrainingData and 
the temporary testing worksheet Temp- 
TestingData, then click Finish. You now 
have created two worksheets containing a 
random sample of the historical data—70 
percent in the Temp TrainingData worksheet 
and 30 percent in the TempTestingData 
worksheet. You can delete these temporary 
worksheets now. 

Next you need to define a connection 
to an instance of Analysis Services, which is 
where you'll build the actual data-mining 
model. In the Data Mining tab of the 
Ribbon, click Connection. (If no connec- 
tions have been created, the command label 
will be <No Connection>.) Click New 
and enter the connection information for 
your instance and for the DMAddinsDB 
database, a database that gets created when 
you use the Server Configuration Utility I 
mentioned earlier. Click OK to close the 
Connect to Analysis Services dialog box, click 
Make Current, then close the Analysis Ser- 
vices Connections dialog box. 
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> > FIGURE 2 Classify Wizard showing Input columns 
data grid 


Now you can build the model. In the 
Data Modeling section of the Data Mining 
tab, you'll see several commands for cre- 
ating different types of models (e.g., Classify, 
Estimate, Cluster). Click Classify to start the 
Classify Wizard, then click Next and make 
sure the “Training Data’!’Training Data’ 
table is selected. Click Next again, and in 
the Column to analyze box select BikeBuyer, 
which then becomes your predictable attri- 
bute—the attribute that’s affected by other 
attributes in a manner you wish to under- 
stand. In the Input columns data grid, which 
Figure 2 shows, clear the check box for the 
ID column, which isn’t relevant to helping 
you understand whether someone is likely 
to be a bike buyer. Click Next again. 

In the Finish step, you can change the 
default structure, model name, and descrip- 
tions. We haven't talked about data-mining 
structures, but you can think of them as a 
schema definition of your historical data. By 
creating a structure, you can apply multiple 
models to your historical data without having 
to redefine the historical schema. Leave the 
Browse model option selected, select the Enable 
drillthrough option, then click Finish to deploy 
the data-mining model to the Analysis Ser- 
vices instance for processing. 

Explore and validate the model. After you 
complete the Classify Wizard, the Browse 
window will be displayed. (If you close this 
window, you can view it later by clicking 
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Browse in the Data Mining 
tab.) To make your Deci- 
sion Tree tab look like the 
one that Figure 3 shows, you need to make 
some slight adjustments. First, change the 
value in the Background drop-down box to 
Yes. By changing the background value, you 
can more easily see the likelihood of Bike 
Buyer being equal to Yes in each node: The 
darker the shading of the node, the higher 
the likelihood. Set the Show Level slider to 
4 to adjust the number of levels shown in 
the tree. Then click the Size To Fit toolbar 
button (the one that has four red arrows in 
the shape of an X). 

By looking at the decision tree, you can 
now start to make some observations. If you 
hover over (or click) the All node, you'll see 
that the 7,000 cases include 696 bike buyers. 
The first split in the tree shows that age is 
the most significant factor in influencing the 
customer’s decision to purchase a bike. The 
second split shows that the next-significant 
factor is either the number of cars owned 
or, for customers from 32 to 53 years old, 
yearly income. The dark background color 
of the Cars = 0 node shows you where the 
largest percentage of likely buyers lies within 
the four levels. To see the cases behind this 
node, right-click the node and select the 
Drill Through menu item; a new worksheet 
will be created in your workbook with the 
applicable case data. 

Next, switch to the Dependency Net- 
work tab and select the Bike Buyer node. 
Using the color coding at the bottom of 
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> > FIGURE 3 Decision Tree tab in Browse 


the tab, you can see that Bike Buyer (in 
light blue) is the selected (or predictable) 
attribute and that the orange nodes are the 
input nodes that predict Bike Buyer. The 
links, which appear as arrows, point from the 
input nodes to the predictable node.You can 
adjust how many links the viewer shows by 
adjusting the slider (located on the left side 
of the window). Lowering the slider shows 
only the strongest links. For example, if you 
lower the slider to the bottom, only the Age 
link remains, which is consistent with the 
first split in the decision tree. 

Now you need to test, or validate, the 
model for accuracy. Recall that the Source 
Data worksheet has been split into two 
partitions, one (the Training Data work- 
sheet) to build the model and the other (the 
Testing Data worksheet) to test it. In the 
Data Mining tab, you can choose among 
three commands within the Accuracy and 
Validation section. Each option takes a set of 
test cases as input. The Classification Matrix 
command runs all the test cases through 
the model and then scores, or classifies, the 
results of the prediction against the actual 
values in the test data—in other words, it 
tells you whether your model predicted 
Bike Buyer = Yes when it was supposed to. 
The accuracy chart (aka a lift chart) shows 
the results of the model’ predictive capabili- 
ties against a “random guess” and “perfect” 
model charted for comparison. The profit 
chart is similar to the accuracy chart but lets 
you input cost and revenue data to deter- 
mine the point of maximum return. 
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For our example, let’s work with the 
Classification Matrix command. Click Clas- 
sification Matrix in the Data Mining tab to 
bring up the Classification Matrix Wizard. 
Click Next three times, then, in the Select 
Source Data step, make sure the “Testing 
Data’! Testing Data’ table is selected. Click 
Next again, then click Finish; a worksheet 
named Classification Matrix will be created. 
According to this matrix, the model is 89.13 
percent accurate in terms of predicting both 
bike buyers and non-bike buyers. You might 
be dismayed to discover the model correctly 
identifies only 16.12 percent of the actual 
bike buyers. However, keep in mind that the 
model correctly identifies non-bike buyers 
97.37 percent of the time. If you recall, 
when we defined our business problem, we 
needed to minimize the cost of a direct mail 
campaign. This model identifies only a small 
subset of the likely buyers, but it eliminates 
most of the likely non-buyers. 

I like to think of the accuracy this way: 
According to the test results, the model iden- 
tified 120 likely bike buyers (2.63 percent of 
the non-bike buyers and 16.12 percent of 
the bike buyers). Of those 120, 49 were bike 
buyers, meaning the model has an accuracy 
of 41 percent. That percentage is much better 
than a random-guess approach (e.g., flipping 
a coin), which would be accurate about 
10 percent of the time (the Training Data 
worksheet consists of 3,000 rows, of which 
approximately 10 percent are bike buyers). 

On a side note, you could use another 
technique to make your model more 
ageressive. This technique, called oversam- 
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pling, involves manipulating the source data 
to increase the frequency of rarely occurring 
data. The add-in’s partitioning wizard sup- 
ports oversampling, and you can learn more 
about this technique at the Microsoft Web 
site (see “Chapter 24 - Effective Strategies 
for Data Mining,” in the SQL Server 2000 
Resource Kit at _http://www.microsoft 


part6/c2461.mspx?mfr=true).With that said, 
let’s stick with the current model and look at 


a profit chart for additional verification. 
Merely for the purposes of this training 

example, let’s make some assumptions about 

the size of our potential customer list, the 

variable and fixed costs, and gross revenue. 

Click the Profit Chart command in the 

Data Mining tab, click Next twice, then, 

in the Specify Profit Chart Parameters step, 

enter the following values: 

e Mining column to predict: BikeBuyer 

e Value to predict: Yes 

e Target population: 35000 

e Fixed cost: 5000 

e Individual cost: 5 

e Revenue per individual: 35 


Click Next and make sure the “Testing 
Data! Testing Data’ table is selected. Click 
Next again, then click Finish; a new work- 
sheet will be created. Figure 4 shows the 
profit chart that’s part of this worksheet. 

According to the chart, you maximize 
profit by targeting 11 percent of the target 
population. The worksheet also indicates 
the optimal “probability threshold” is 15.14 
percent. In other words, when the data- 
mining model makes a 
prediction, the model 
also computes a prob- 
ability for the predic- 
tion. The worksheet 
indicates we should 
target customers with 
a probability (to pur- 
chase a bike) of 15.14 
percent or higher. 
You'll see this prob- 
ability value in the 
final step. 

Deploy and update 


> P FIGURE 4 Profit chart sample 
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model. In this example, 
Excel is your client 
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tool, so no deployment is necessary. Let’s go 
ahead and run your model against a small set 
of potential customers. In the Data Mining 
tab, click Query, then click Next twice. In 
the Select Source Data step, make sure the 
‘New Customers’! Table 17’ table is selected. 
Click Next again and check that all the 
relationships (except BikeBuyer, of course) 
are properly mapped from the worksheet to 
your model attributes. 

Click Next again, then click Add 
Output. In the Name field, enter Proba- 
bilityToBuy. Select BikeBuyer from the 
Columns list, PredictProbability from the 
Column Functions list, and Yes from the 
Function Parameters list. Click OK to 
close the dialog box. Complete the Wizard 
by clicking Next and then Finish. A new 
column, Probability ToBuy, is added to the 
New Customers table. Using the probability 
threshold of 15.14 percent from the Profit 
Chart worksheet as a guide (and rounding 
your new column to the nearest hundredth), 
you should target 19 of the 78 potential 
customers in this set. You can now run a 
query against all of the potential customers 
and send the final results to the marketing 
director. Congratulations—you're finished! 


More Uses for the Data Mining 
Client for Excel 
We used the Data Mining Client for Excel 
add-in to walk through the entire data- 
mining process. You can also use the add-in 
to browse or query an existing model. For 
example, if you’ve installed the Adventure- 
Works Analysis Services database sample, 
you can establish a connection to the data- 
base (using the Connection command at 
the Data Mining tab), then browse or query 
any of the models. A practical idea is to use 
Excel data as input to a data-mining query. 
With this approach, the model is ultimately 
built and maintained by IT folks (and might 
be processed using very large volumes of 
historical data) but is available to end users 
for validation, browsing, and querying. EX 
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ependencies between database objects 

dictate the order of deployment in 
code releases. The order of object deploy- 
ment is key to the success of database 
projects, especially when code is moved 
between different environments. There- 
fore, the dependency information viewed 
by developers and DBAs must be accurate. 
Although SQL Server 2005 and SQL 
Server 2000 have many great features, 
they still don’t track object dependencies 
in a reliable fashion. Interestingly, nearly 
all third-party dependency tools rely on 
the limited information recorded by SQL 
Server, which can lead to functional and 


deployment errors. To overcome these 
checker that returns the dependency 
and deployment order of objects. This 

simple algorithm 

and some T-SQL 
show you how the dependency checker 
works and how to use it, let’s look at 
tracking functionality doesn’t work well 
and how the dependency checker avoids 


deficiencies, I created a dependency 
checker uses a 
e on the WEB 
Download the listings at 
InstantDoc ID 95428 code. Before I 
why SQL Server’s built-in dependency 
the same problems. 


SQL Server's Deficiencies in 
Detecting Dependencies 

From the application developers’ view, 
SQL Server contains four standard object 
types: tables, views, stored procedures, 
and functions. All other objects are either 
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prerequisites or attributes of objects in 
these groups. For example, logins, users, 
rules, and full-text catalogs are prereq- 
uisites, whereas triggers, foreign keys, 
and synonyms are attributes because 
they’re typically created together with the 
parent object. When database projects are 
deployed, all prerequisites are created first. 
Hence, the main challenge is to identify 
dependencies and determine the deploy- 
ment order of standard objects. With this 
goal in mind, let’s concentrate on finding 
all possible dependencies between tables, 
views, stored procedures, and functions, 
potentially through their child objects. 

Dependencies are expressed in many 
ways. A stored procedure that selects 
records from a table or view depends 
on underlying objects. Sumilarly, a view 
that selects data from a table or function 
depends on underlying objects. Another 
example is a trigger that manipulates data 
in tablel when data updates are made to 
table2; table1 therefore depends on table2. 

SQL Server 2000 uses the system table 
sysdepends to store object dependencies. In 
SQL Server 2005, dependency informa- 
tion is recorded in the system view sys.sql 
_dependencies. Both of them operate in a 
similar manner and map an object’s ID 
to each dependents ID. Explicitly, sys- 
depends contains the columns [id] and 
[depid], whereas sys.sql_dependencies uses 
[object_id] and [referenced_major_id] to 
denote the relationship between an object 
and its dependent. 
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Two problems arise with the built- 
in dependency tracking functionality 
in SQL Server. First, dependencies are 
recorded only when the dependent item 
already exists in the underlying database. 
However, in most cases, nothing prevents 
users from creating an object that depends 
on another object when the latter doesn’t 
yet exist, in which case SQL Server 
can’t detect the dependency. Second, if a 
child of one object (e.g., a trigger called 
TRG_table1_AfterUpdate that is invoked 
after data updates to table1) depends on 
another object (e.g., the trigger manipu- 
lates data in table2), the parent object 
tablel depends on table2. 

SQL Servers dependency tracking 
functionality doesn’t account for these 
scenarios. Thus, if a user generates scripts 
that contain the schema of such objects 
by using popular auto-scripting tools, the 
order of operations in the scripts is incor- 
rect. Consider the following examples: 

Example 1: A stored procedure that 
references a table is created before the table, 
as Listing 1, page 16, shows. In this case, 
each call to execute the stored procedure 
before the table is created results in errors. 

Example 2: A trigger updates a target 
table when data changes are made to the 
triggers parent table, but the trigger is cre- 
ated before the target table, as Listing 2, 
page 16, shows. Because the target table 
isn't available in the database (i.e., it hasn’t 
been created yet), all data updates to the 
parent table incur errors. 
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P USTING I Code for Example 1 


CREATE PROCEDURE CdboJ.CuspMyProc] AS 
SELECT TOP 1 * FROM 
ECdbol].CutbMyTableJ 

GO 

CREATE TABLE [dbo].CutbMyTable] 
(LMyCol1] INT, CMyCol2] NVARCHAR(32)) 

GO 


[Z USTING 2 Code for Example 2 


CREATE TABLE [dbo].CutbMyTable1] 
(CMyIntCol] INT, CMyNCharCol] 
NCHAR(10)) 

GO 

CREATE TRIGGER 
ECdbo].CTRG_utbMyTable1_Insert_Update] 

ON CdboJ.CutbMyTable1] 

AFTER INSERT, UPDATE 

AS 
INSERT INTO CdboJ.CutbMyTable2] 
([MyIntCol]) 

SELECT CMyIntCol] FROM [inserted] 

GO 

CREATE TABLE [dbo].CutbMyTable2] 
(MyIntCol INT) 

GO 


LISTING 3 Code for Example 3 


CREATE VIEW CdboJ.CuvwMyBottomView] 

aS uset "1" AS [Column1] 

ee VIEW ECdbol.CuvwMyTopView] 
ae * FROM Cdbo].CuvwMyBottomView] 
DROP VIEW Cdbol].CuvwMyBottomView] 
parent table incur errors. 

Example 3: A view selects data from 
another view that doesnt exist in the 
underlying database, as Listing 3 shows. All 
attempts to select data from the non-existent 


view will result in binding errors. 


To avoid such problems, the dependency 
checker analyzes databases for the following 
dependencies: 

e Tables that depend on other tables. 
Cross-table dependencies can occur 
when a table with a foreign-key con- 
straint depends on the referenced 
primary table. Cross-table dependen- 
cies can also occur when the child of 
one table depends on a second table, as 
illustrated by Listing 2. 

e Tables that depend on stored procedures, 
functions, or views. If a trig- 
ger selects records from views 
or functions, or if the trigger 
code executes functions or 
stored procedures, then the 
parent table depends on the 
referenced objects. 

e Views, stored procedures, 
or functions that depend on 
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other objects. This type of dependency 
can occur when views select rows from 
tables, functions, or other views. Simi- 
larly, it can occur when stored proce- 
dures or functions select data from the 
same set of objects or call other stored 
procedures or functions. 

Objects that depend on themselves. A 
table that has a self-referencing foreign- 
key constraint is considered to be self- 
dependent. Other self-dependencies 
include a trigger that updates its parent 
table and a stored procedure or func- 
tion that calls itself. For the purpose of 
project deployment, self-dependencies 
are of little interest, although they’re 
still taken under consideration by the 
dependency checker’s algorithm. 


How the Checker Works 
Foreign-key constraints are stored appropri- 
ately in SQL Server. However, dependency 
problems arise in all other cases in which 
relationships are embedded in the object 
text or in the code of child objects. Conse- 
quently, it seems natural to use a text-based 
dependency checker that parses T-SQL code 
to search for object references. 

For the purposes of the dependency 
checker, let’s define a text-based object as 
any single view, stored procedure, function, 
or trigger. One text-based object depends on 
another text-based object when the depen- 
dent objects name is referenced in the T-SQL 
code. Moreover, when an object is referenced, 
the first character before and after the object’s 
name must be one of the values listed in Table 
1. It’s also worth noting that if a left square 
bracket appears before the object’s name, the 
object name is most likely appended with a 
right square bracket. 

The dependency checker operates by 
collecting all text-based objects in the 
database as requested by the user. Input 
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and configuration parameters let the user 
process all database objects or just specify a 
collection of items. Next, the dependency 
checker retrieves the CREATE statements 
for all objects by querying sys.sql_defini- 
tions in SQL Server 2005 or syscomments 
in SQL Server 2000.The checker stores the 
statements in a temporary table. 

To find references to other objects in the 
text, all T-SQL code comments are stripped 
out. The dependency checker does this by 
finding instances of strings that start with the 
-- or /* comment character, deleting each 
comment, and storing the resulting code in 
the same temporary table. 

At this point, the functional raw code is 
at hand and the search process can begin. 
The search relies on the set of characters 
in Table 1. Explicitly, the algorithm uses 
a table called #tblTextBasedObjects that 
contains the columns ObjectID (INT) and 
ObjectTextWithoutComments (NTEXT 
or NVARCHAR(MAX)), which store the 
ID and the comment-free CREATE state- 
ment for each object, respectively. Further- 
more, the algorithm uses a temporary table 
named #tblAIIDBR outinesTablesViews to 
store all standard objects in the database. This 
table contains four columns: 

e ObjectName of type NVAR- 
CHAR (128) 

e ObjectID of type INT 

e ObjectNameForLikeSearches1 of type 
NVARCHAR (200) 

e ObjectNameForLikeSearches2 of type 
NVARCHAR (200) 


The last two columns are used for string- 
matching purposes and are populated using 
the code that Listing 4 shows. The depen- 
dency checker can now search for object 
references in a straightforward manner by 
applying the LIKE function on the pat- 
terns in #tbIAIIDBR outines Tables Views, as 


TABLE | Characters That Appear Before or After Objects’ Names in T-SQL Code 


it Pole [aoe [= ols [ol [ole fo 


a CC 


CCC CCC ja ae a 
a O e O O O 


* Can only appear before an object’s name. 


** Can only appear after an object's name. 
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Dependency Checker 


LISTING 4 Code That Populates the 
ObjectNameForLikeSearches1 and 
ObjectNameForLikeSearches2 Columns 


DECLARE @StartStr NVARCHAR(32) 
DECLARE @EndStr NVARCHAR(32) 


-- @StartStr and aEndStr hold the 
-- characters in Table1, excluding 
-- CHAR(91) and CHAR(93), which are 
-- handled separately. 


SET @StartStr = NAE + CHAR(9) + 
CHAR(10) + CHAR(13) + 
CHAR(32) + CHAR(33) + CHAR(34) + 
CHAR(37) + CHAR(38) + 
CHAR(40) + CHAR(41) + CHAR(42) + 
CHAR(43) + CHAR(44) + 
CHAR(45) + CHAR(46) + CHAR(C47) + 
CHAR(58) + CHAR(59) + 
CHAR(60) + CHAR(61) + CHAR(62) + 
CHAR(94) + CHAR(124) + 


CHAR(126) + N'J' 


SET @EndStr = N'C' + CHAR(9) + CHAR(10) 
+ CHAR(13) + 


CHAR(32) + CHAR(33) + CHAR(34) + 
CHAR(37) + CHAR(38) + 
CHAR(40) + CHAR(41) + CHAR(42) + 
CHAR(43) + CHAR(44) + 
CHAR(45) + CHAR(46) + CHAR(C47) + 
CHAR(58) + CHAR(59) + 
CHAR(60) + CHAR(61) + CHAR(62) + 
CHAR(94) + CHAR(124) + 


CHAR(126) + N'JZ' 


-- Assign values to 
-- ObjectNameForLikeSearches1 and 
-- ObjectNameForLikeSearches2. If the 
-- object's name contains an 
-- apostrophe or space, it must be 
-- referenced in valid identifiers; 
-- otherwise, it can be wrapped with 
-- the characters above. 
UPDATE #tbLALLDBRoutinesTablesViews 
SET ObjectNameForLikeSearches1 = CASE 
WHEN ObjectName LIKE N'Z%''Z" OR 
ObjectName LIKE N'% %' 
-- Handle square bracket identifiers. 
THEN N'ZCCI' + ObjectName + '1%' 
ELSE aStartStr + ObjectName + a 
EndStr 
END, 
ObjectNameForLikeSearches2 = CASE 
WHEN ObjectName LIKE N'Z''Z' OR 
ObjectName LIKE N'% %' 
-- Handle quoted identifiers. 
THEN N'Z"' + ObjectName + '"Z%" 
ELSE N'ZCLI' + ObjectName + 
aya 
END 


The code in Listings 4 and 5 illustrate 
the basic concept behind the dependency 
checker; the complete algorithm and code 
are a lot longer.-You can download the com- 
plete algorithm and code, which includes 
additional details, from the SQL Server 
Magazine Web site. 


How Io Use the Checker 

When you download the code, you'll find 
two files: Dependency Viewer2K5.sql, which 
is the dependency checker for SQL Server 
2005, and DependencyViewer2K.sql, which 
is the dependency checker for SQL Server 
2000. Here are the main parameters for 
both scripts: 
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* @IncludeAIDBObjects. The @ 
IncludeAl]IDBObjects option tells the 
script whether to consider all objects 
in the database (value of 1) or only 
user-provided objects (value of 0). If 
you set this parameter to 0, you must 
populate the temporary table #tblRe- 
questedObjects with the names of all 
the objects you want examined. 
@IncludeDependencies. If you set the 
@lIncludeAllDBObjects parameter to 
0 and provide the object names in the 
#tbIR equestedObjects table, you can 
set the @IncludeDependencies option. 
Setting this option to 1 tells the script 
to find objects that the items specified 
in #tblRequestedObjects rely on. 
@IncludeDependants. Like the 
@IncludeDependencies option, the 
@lIncludeDependants option is avail- 
able if you set @IncludeAlLDBObjects 
to 0 and provide the object names in 
#tbIR equestedObjects. Setting this 
option to 1 tells the script to find 
objects that depend on those items 
listed in #tblR equestedObjects. 


There are additional options that you can 
set. The scripts’ usage instructions describe 
them in detail. When you use the scripts, 
keep the following considerations in mind: 

Circular dependencies. Circular depen- 
dencies can occur when each item in a group 
of objects depend on each other. When 
checking for dependencies, it’s important 
for you to recognize circular relationships. 
However, for deployment purposes, there’s 
no right or wrong ordering; items that fall 
under this category can be deployed in an 
arbitrary order. 

False positives and false negatives. False 
positives refer to instances in which the 
algorithm reports dependencies that don't 
actually exist. These events occur when 
objects’ names contain the special characters 
listed in Table 1. For example, if a database 
contains the [sales orders customers] and 
[orders] tables and a stored procedure refer- 
ences the [orders] table, the algorithm will 
falsely report a dependency on the [sales 
orders customers] table as well. In spite of 
the possibility of false positives, they don’t 
affect the correctness of the deployment 
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LISTING 5 Code That Applies the LIKE 
Function to Search for Object References 


SELECT OBJECT_NAME(a.ObjectID) AS 
TextBasedObjectID, 
OBJECT_NAME(b.ObjectID) AS DependsOn 

FROM #tblTextBasedObjects a 
INNER JOIN 
#tblLALLDBRoutinesTablesViews b 
ON a.ObjectID <> b.ObjectID 

WHERE a.ObjectTextWithoutComments 
LIKE b.ObjectNameForLikeSearches1 
OR a.ObjectTextWithoutComments 
LIKE b.ObjectNameForLikeSearches2 


order and therefore are insignificant. But this 
isn’t the case with false negatives. 

False negatives refer to instances when 
dependencies remain undetected. This 
occurs in several situations, such as when 
a text-based object is created with the 
ENCRYPTION option enabled (in which 
case, the dependency checker doesn’t pro- 
cess the body text) or when the referenced 
object’s name is noncontiguous. False nega- 
tives can also occur when several objects 
have the same name yet belong to different 
schemas, in which case the DBA responsible 
for the database will run into much bigger 
problems than identifying dependencies. To 
conclude, false negatives can happen in rare 
scenarios, but SQL Server and other tools 
also overlook these dependencies. 

Size limitation. The dependency checker 
works well for analyzing dependencies and 
deployment orders for dozens or even 
hundreds of objects. However, because it’s 
written in T-SQL, which isnt too efficient 
in string manipulations, it won't work well 
for databases with large numbers of objects. 
These performance issues can be resolved 
by translating the T-SQL code to C# or by 
using full-text indexes and searches inside 
the T-SQL code. 


A Reliable Alternative 

SQL Server 2005, SQL Server 2000, and 

many third-party products don’t track object 

dependencies in a reliable fashion. The 

dependency checker is a much more reliable 

alternative. SOL 
InstantDoc ID 95428 
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DATETIME 
Calculations, Part 4 


Calculate first/last occurrences of 3 
weekday in a month 


D atetime calculations are challenging. They often involve tricky logic. In the past 
three columns, I’ve gone into depth about datetime calculations, covering various 
types, including calculating the date of the first and last days of a month based on a 
given event datetime value, and calculating the last and next occurrences of a weekday. 
This month, I present techniques for calculating the date of the first and last occurrences 
of a weekday in a month, based on a given event date—for example, calculating the 
first occurrence of a Monday in the current month or calculating the last occurrence 
of a Monday in the current month. As usual, in my examples, I use the GETDATE() 
function as the input event datetime value, but the techniques I present will work for 
any input event datetime value. In this article, I also discuss techniques to identify week 
boundaries (e.g., start and end of week). 


Calculating First and Last Weekday 

If you’ve kept up with the previous articles in the series, you're familiar with the tech- 
niques to calculate the date of the first/last day of the month, as well as the date of the 
last/next occurrence of a certain weekday. As a reminder, here’s one of the techniques 
I showed to calculate the date of the first day of the current month: 


SELECT DATEADD(month, DATEDIFF(month, Ø, GETDATE()), Ø); 


The logic of this technique is as follows: Calculate the difference in terms of months 
between an anchor date at midnight (in this case, 0, representing the base date January 1, 
1900) and the event date. Call this difference diff; Add diff months to the anchor date. 

And here’s one of the techniques I shared to calculate the next occurrence of a 
weekday, inclusive (in this example, next Monday): 


SELECT DATEADD(day, DATEDIFF(day, Ø, GETDATE()-1) /7*7 + 7, Ø); 


The logic of this technique is as follows: Calculate the difference in terms of days 
between an anchor date known to be a Monday (0, in this case) and the day before 
the event date. Call that difference diff: Divide diff by 7, multiply by 7, and add 7 to get 
the difference between the anchor date and the next Monday. Call the result new_diff- 
Add new_diff days to the anchor date to get the date of the next Monday (inclusive). 
Remember that you need to use an anchor date that reflects the weekday you're after. 
For example, I used 0 here because the integer 0 converted to datetime yields the base 
date January 1, 1900, which is known to be a Monday. If you wanted the next Tuesday, 
you would use 1 instead of 0 as the anchor date. 
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With this reminder, you're ready for 
the calculations that are the focus of this 
section. Suppose you need to return the 
date of the first occurrence of a Monday 
in this month. You might have already 
guessed that you need to combine the last 
two calculations I presented: Calculate the 
date of the first day of the current month 
(call it fmd), then calculate the date of the 
next occurrence of a Monday in respect 
to fmd: 


SELECT DATEADD(day, DATEDIFF 
(day, Ø, DATEADD (month, 
DATEDIFF (month, Ø, GETDATE 
O), Ø) -- fmd -1)/7*7 + 7, 
0); 


To calculate the date of the first Tuesday 
of the current month, simply use the 
anchor date 1 (January 2, 1900) instead 
of 0 (January 1, 1900): 


SELECT DATEADD(day, DATEDIFF 
(day, 1, DATEADD(month, 
DATEDIFF(month, @, GETDATE 
©), Ø) -- fmd -1) /7*7 + 7, 
1); 


To calculate the date of the last occurrence 
of a weekday in the month, you need to 
combine two calculations: the date of the 
last day of the month and the date of the 
last occurrence of a weekday, inclusive. As 
a reminder, here’s the technique I shared 
in the previous articles to calculate the 
date of the last day of the current month 
(call it Imd): 


SELECT DATEADD(month, DATEDIFF 
(month, Ø, GETDATE())+1, 0)-1; 


And heres the technique I shared to 
return the date of the last occurrence of a 
weekday (in this example, Monday): 


SELECT DATEADD(day, DATEDIFF 
(day, Ø, GETDATE()) /7*7, 0); 


Itzik Ben-Gan (itzik@solidqualitylearning.com), a 
mentor at Solid Quality Learning, teaches, lectures, and con- 
sults internationally. He manages the Israeli SQL Server Users 
Group, is a SQL Server MVP, and is the author of the /nside 
Microsoft SQL Server 2005: T-SQL series (MSPress, 2006). 
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TABLE | Weekly Aggregates of Sales 


Data 
totalgty 


weekend 


1992-06-21 
00:00:00.000 


1993-02-21 
00:00:00.000 


1993-03-14 
00:00:00.000 


1993-05-23 
00:00:00.000 


1993-05-30 
00:00:00.000 


1993-10-31 
00:00:00.000 


1993-12-12 
00:00:00.000 


1994-09-18 
00:00:00.000 


weekstart 


1992-06-15 
00:00:00.000 


1993-02-15 
00:00:00.000 


1993-03-08 
00:00:00.000 


1993-05-17 
00:00:00.000 


1993-05-24 
00:00:00.000 


1993-10-25 
00:00:00.000 


1993-12-06 
00:00:00.000 


1994-09-12 
00:00:00.000 


a 


Combine the two techniques to get the date 
of the last occurrence of a Monday in the 
current month (in respect to the date of the 
last day of the current month): 


SELECT DATEADD(day, DATEDIFF(day, 
D, (DATEADD(month, DATEDIFF 
(month, Ø, GETDATE())+1, @)-1) 

-- lmd) /7*7, 0); 


As before, to get the last occurrence of a 
Tuesday in the current month, simply re- 
place the anchor date (e.g., 1 instead of 0): 


SELECT DATEADD(day, DATEDIFF(day, 
1, (DATEADD(month, DATEDIFF 
(month, Ø, GETDATE())+1, @)-1) 
-- lmd) /7*7, 1); 


Of course, you can use similar techniques to 
calculate the date of the first/last occurrence 
of a weekday in a year; simply use the date 
part year instead of month. For example, to 
calculate the date of the first occurrence of a 
Monday in the current year, you can use 


SELECT DATEADD(day, DATEDIFF(day, 
Ø, DATEADD(year, DATEDIFF(Cyear, 
Ø, GETDATE()), Ø) -- fmd -1) 
/7*7 + 7, 0); 


To calculate the date of the first occurrence 
of a Tuesday in the current year, use 


SELECT DATEADD(day, DATEDIFF(day, 
1, DATEADD(year, DATEDIFFCyear, 
Ø, GETDATE()), Ø) -- fmd-1) 
/7*7 + 7, 1); 


To calculate the date of the last occurrence 
of a Monday in the current year, use 


SELECT DATEADD(day, DATEDIFF(day, 
0,(DATEADD(year, DATEDIFF(Cyear, 
Ø, GETDATE())+1, 0-1) -- lmd 
) /7*7, 0); 


To calculate the date of the last occurrence 
of a Tuesday in the current year, use 


SELECT DATEADD(day, DATEDIFF(day, 
1,(DATEADD(year, DATEDIFFCyear, 
Ø, GETDATE())+1, 0-1) -- lmd 

X /7*7, 1); 


Identifying Week Boundaries 

The calculations to identify the date of the 
next/last occurrence of a weekday in respect 
to a given event datetime value are also 
effective for identifying week boundaries 
(e.g., week start and end) that correspond 
to an event datetime value. Suppose that, in 
your calculations, you want to assume that 
a week starts on a Monday and ends on a 
Sunday. If you want to determine the dates 
of the week boundaries in respect to a given 
event datetime value (e.g, GETDATE), 
simply use the calculations for the last 
occurrence of Monday (inclusive) and the 
next occurrence of Sunday (inclusive): 


SELECT DATEADD(day, DATEDIFF(day, 
Ø, GETDATE()) /7*7, Ø) AS 
weekstart; 

SELECT DATEADD(day, DATEDIFF(day, 
6, GETDATE()-1) /7*7 + 7, 6) AS 
weekend; 


Here’s an example of a problem in 
which you need to calculate week bound- 
aries: Given the sales table in the sample 
pubs database, you need to aggregate order 
quantities (1.e., the qty column) by the week 
(corresponding to the ord_date column). 
Table 1 shows the desired result. 

The following code calculates the week- 
start date in respect to ord_date, assuming 
the week starts on Monday (0 is the anchor 
date used), generating the output that Table 
2 shows: 
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TABLE 2 Output of Query Calculating 
Week Start Dates 


‘rd_date | woekstart 
1994-09-14 1994-09-12 5 
00:00:00.000 00:00:00.000 
1994-09-13 1994-09-12 |3 
00:00:00.000 00:00:00.000 
1993-05-24 1993-05-24 |50 
00:00:00.000 00:00:00.000 
1994-09-13 1994-09-12 75 
00:00:00.000 00:00:00.000 
1994-09-14 1994-09-12 10 
00:00:00.000 00:00:00.000 
1992-06-15 1992-06-15 
00:00:00.000 00:00:00.000 
1992-06-15 1992-06-15 20 
00:00:00.000 00:00:00.000 
1992-06-15 1992-06-15 20 
00:00:00.000 00:00:00.000 
1994-09-14 1994-09-12 20 
00:00:00.000 00:00:00.000 


1994-09-14 1994-09-12 |25 
00:00:00.000 | 00:00:00.000 


1993-05-29 1993-05-24 |25 

00:00:00.000 00:00:00.000 
1993-05-29 1993-05-24 |15 
00:00:00.000 00:00:00.000 
1993-05-29 1993-05-24 |25 
00:00:00.000 00:00:00.000 
1993-10-28 1993-10-25 |15 
00:00:00.000 00:00:00.000 
1993-12-12 1993-12-06 |1 
00:00:00.000 00:00:00.000 
1993-02-21 1993-02-15 |35 
00:00:00.000 00:00:00.000 
1994-09-14 1994-09-12 |15 
00:00:00.000 00:00:00.000 
1994-09-14 1994-09-12 |10 
00:00:00.000 00:00:00.000 
1993-03-11 1993-03-08 |2 
00:00:00.000 00:00:00.000 
1993-05-22 1993-05-17 | 30 
00:00:00.000 00:00:00.000 


oOo 


on 


1993-05-29 1993-05-24 |2 
00:00:00.000 00:00:00.000 
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Solution to April's Puzzle: Naming an Heir 

A mighty king had three sons and wanted to declare the 

wisest of them as his heir. He decided to give them a logic 

puzzle to test their wisdom. He placed the sons in a trian- 
gular room, each in a different corner, and placed a hat on 
each son’s head. The king said, “You need to determine the 
color of your hat. You can’t take your hat off to look at it, 
and you can’t communicate in any way. The hat on your 

head is either green or red. At least one of you is wearing a 

green hat. I'll be waiting outside the door and will ring a bell 

every five minutes. You can’t leave the room until you know 
the color of your hat. If you know the answer, you must 
wait for the next bell ring, then come tell me the answer.” 

At the third bell ring, one of the sons opened the door and 

told the king the answer. The king said, “You're correct, and 

I’m naming you my heir. However, I’m disappointed in you. 

You have still much to learn.” What was that son’s answer, 

and why was the king disappointed? 

That son’s answer was green, based on the following 
logical deduction: 

e If there were two red hats and one green hat, the son 
with the green hat would have realized it immediately 
(by seeing both his brothers wearing red hats) and 
approached the king at the first bell ring. Because this 
didn’t happen, there’s—at most—one red hat among 
the sons. 

If there was one red hat and two green hats, each of 
the two sons wearing green hats should have seen his 
brothers wearing one red and one green hat; there- 
fore, both these brothers could have deduced that they 
were wearing green hats (since no one approached 
the king after the first bell ring, and there’s at most 
one red hat in such a case) and thus approached the 
king at the second bell ring. 

e The son that ultimately figured out the answer rea- 
soned that his brothers weren’t stupid, so if no one 
approached the king at the second bell ring, they 
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must all be wearing green hats. Of course, this tells 
you that he saw both his brothers wearing green hats. 
So, he approached the king at the third bell ring to 
say that he was wearing a green hat. 


Why was the king disappointed in his son? The answer 
involves true wisdom. The son should have reasoned that 
any setting in which (at minimum) one of the hats is green 
and not all of them are green is an unfair contest. If at least 
one hat is green, and not all hats are green, different sons can 
figure out their own hat color at different points in time. 

For example, if two of the hats are red, the son that 
wears a green hat can figure out the answer immediately 
and approach the king at the first bell ring, while the other 
two must wait to see whether someone approaches the 
king at the first bell ring (in which case it will be too late 
for them). Similarly, if one of the hats is red, the two sons 
wearing green hats can know the answer after the first bell 
ring and approach the king at the second bell ring, while 
the son with the red hat must wait to see whether someone 
approaches the king at the second bell ring (in which case 
it’s too late for him). 

If the king had favored one of the sons, he would have 
named that son his heir without a contest. Because he 
wanted to put their wisdom to test, you would expect the 
contest to be fair. The only way for the contest to be fair 
while having a minimum of one green hat is to have three 
green hats. The king expected one of his sons to approach 
him at the first bell ring with this logic. 


May’s Puzzle: The Next Element in a Series 
I got this nice puzzle from Adi Dafni. Given the following 
series of elements, can you determine the next element? 


il, Wil, Bil, Wail, Walaa, e 
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SELECT ord_date, DATEADD(day, 
DATEDIFF(day, Ø, ord_date) 
/7*7, 0) AS weekstart,qty 

FROM pubs.dbo.sales; 


You can now simply group the rows by 
weekstart, and even return the weekend date 
by adding six days to weekstart, generating 
the desired output that Table 1 shows: 


SELECT weekstart, weekstart + 
6 AS weekend, SUM(qty) AS 
totalqty 

FROM (SELECT DATEADD(day, 
DATEDIFFtday, W, ord_date) 
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/7*7, Ø) AS weekstart, qty 
FROM pubs.dbo.sales) AS D GROUP 
BY weekstart; 


Remember that if you have full control over 
which weekday is considered the first day of 
the week. The last example used Monday as 
the start of the week. If you want to consider 
Sunday as the start of the week, change the 
anchor date accordingly to 6 (representing 
January 7, 1900): 


SELECT weekstart, weekstart + 
6 AS weekend, SUM(qty) AS 
totalqty 
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FROM (SELECT DATEADD(day, 
DATEDIFF(day, 6, ord_date) 
/7*7, 6) AS weekstart, qty 
FROM pubs.dbo.sales) AS D 
GROUP BY weekstart; 


Gelting Trickier 

You're probably realizing that more and more 

of these datetime calculations involve tricky 

logic. To improve your logic, remember to 

practice the pure logic puzzles presented in 

the Logical Puzzle sidebar. SOL 
InstantDoc ID 95271 
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Set Up and support your own 
mirroring solution for high availability 


W hen you're looking for a high-availability solution, you might find that although 
third-party providers offer good solutions and complete packages, their products 
can get pretty expensive. Once you evaluate the costs, you might decide that imple- 
menting your own database mirroring solution is a no-brainer. However, implementing 
and supporting a successful mirroring solution takes some thought and planning. Sure, 
you can slap together a mirroring solution with little effort, but if you don’t have a 
handle on every aspect of your environment, you could be in for a bumpy ride. You 
must consider such aspects as network speed, security, maintenance, upgrades, and 
failover. (For information about how to decide whether mirroring is right for you, see 


e on he WEB the Web sidebar “Should You Choose Database Mirroring?” 
eecaremal Ë InstantDoc ID _95294.) If you don’t include all the neces- 
scripts at InstantDoc ID 95293 © sary aspects in your implementation, you could bring your 


and read the Web sidebar at 
InstantDoc ID_95294 


production environment down in a matter of minutes. 

So, let’s look at how to set up and support mirroring as 
an alternative to an expensive third-party high-availability solution. In this example, we 
might not need the entire server to fail over; instead, we'll assume we're working in 
an environment in which we need only a single database or two to be mirrored. What 
you'll find is that although mirroring is an excellent high-availability technology for 
individual user databases, it’s far from a complete disaster-recovery technology. For one 
thing, mirroring doesn’t protect system databases, whereas third-party products typi- 
cally do. So although you wouldn't use mirroring to fail over an entire server, it can be 
a good choice for protecting important data and it doesn’t involve the complexity of 
replication. 


Configuring Your Network 

Getting mirroring going is easy enough, but the process can reveal previously unknown 
network problems. Therefore, it’s always best to get your mirroring setup running on a 
non-production system before taking the plunge with your important data. 

First, you need to make sure you have the proper Fully Qualified Domain Name 
(FQDN) for both your primary and secondary servers. You can verify the FQDN by 
connecting to the server console either directly or through Windows 2000 Server Ter- 
minal Services. In Terminal Services, go to Start, Run, and type cmd.Then, type ipconfig 
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/all-You should see a screen like the one that 
Figure 1 shows. 

Notice in Figure 1 that the second line 
in the result set is Primary Dns Suffix. This 
setting is what SQL Server is looking for 
when setting up mirroring. You'll have to 
work out any DNS issues before you can 
set up murroring, so this setting is a good 
place to start. If the primary DNS suffix isn’t 
what you're expecting to see, the easiest way 
to fix the problem is to add the proper DNS 
suffix in the connection’ IP properties. You 
can see how to add the DNS suffix on the 
DNS tab of the Advanced TCP/IP Settings 
dialog box in Figure 2. 

Why does SQL Server require you 
to use the servers FQDN instead of the 
NETBIOS (domain name) or even the IP 
address? This security requirement lets SQL 
Server be sure it is talking to the desired 
server for Secure Sockets Layer (SSL) and 
the mirroring network validation checks. 
(You must provide a canonical name that 
can be compared to a self definition of the 
machine.) You also have to address any net- 
work problems before you have a successful 
mirroring setup. 


Advanced TCP/IP Settings 


IP Settings DNS | WINS | Options | 


DNS server addresses, in order of use: 


cx C:\\WINDOWS'\system32\cmd.exe 


C:\Documen and Settings\sean.KIDDO>ipconfig /flushdns 


Windows IP Configuration 


Successfully flushed the DNS Resolver Cache. 
C:\Documents and Settings\sean.KIDDO>ipconfig /all 
Windows IP Configuration 

Host Name >: darthcon 
Primary Dns Suffix =: Domain.com 
Node Type : Broadcast 
IP Routing Enabled. . z No 


WINS Proxy Enabled. . ......- 
DNS Suffix Search List. .... . 


: No 
z Kiddo.Poops 
gateway.2wire .net 


Ethernet adapter UMware Network Adapter UMnet8: 


: UMware Virtual Ethernet Adapter for 


.. =...» + | 88-56-56-C0-06-08 
E A E S E RT : No 
> 192.168.79.1 


> 255.255.255.0 


Subnet Mask 


Default Gateway 


vices screen for ver 


Getting Set Up for Mirroring 

Now, you can begin to set up mirroring. 

Follow these basic steps: 

e Take a full backup and a log backup of 
the primary database. 

e Restore both backups to the secondary 
server. 

e Create the endpoints. 

e Establish partnerships. 

e Start mirroring. 


This list is oversimplified, so lets walk 
through the details of the setup that you'll 
encounter in the mirroring pane of the 
database properties window and the Data- 
base Mirroring Security Wizard. Note that 
you must perform the backups and restores 
yourself. 

That’s a good place to 
start—so begin by performing 
a full backup and a log backup, 
then restore them both on the 
secondary server. You must 


Add... | Edit... | Remove | 


The following three settings are applied to all connections with TCP/IP 
enabled. For resolution of unqualified names: 


@ Append primary and connection specific DNS suffixes 
IV Append parent suffixes of the primary DNS suffix 
© Append these DNS suffixes [in order): 


Add | Edit | 
DNS suffix for this connection: [Domain com 


IV Register this connection's addresses in DNS 
I Use this connection's DNS suffix in DNS registration 


> > FIGURE 2 DNS setting tab 


restore your backups with the 
same names that they have 
on the primary database, but 
the restored databases don't 
have to be in the same loca- 
tion on the secondary server. 
In addition, its OK to run 
the log backup right after the 
full backup; you don’t have to 
leave any time between them. 

At this point, you might 
wonder why you even need 
the log backup? If you’ve 
already restored the full 
backup with norecovery, why 
take a log backup? Shouldn't 
mirroring just be able to pick 


up where the full backup left 
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off? The answer is yes and no. Although 
you could use only the full backup, SQL 
Server requires the first log backup to be 
restored to ensure that no bulk logged 
pages are in the database. The log backup 
includes any bulk logged pages and clears 
the flag that indicates the existence of bulk 
logged pages. If you took a log backup after 
the one applied to the mirror, that backup 
could include bulk logged pages that were 
generated and backed up but left no traces 
in the log. In this case, the bulk logged data 
wouldn’t make it to the mirror because it’s 
not in the log stream and no bulk logged 
bits are set. 

Once the backups are restored on the 
secondary server, we can start the wizard 
and establish the mirroring session. Right- 
click the primary database and go to Tasks, 
Mirror or go to Properties, then select the 
Mirroring tab. Click Configure Security 
to start the Configure Database Mirroring 
Security Wizard that Figure 3 shows. In the 
Include Witness Server dialog box, you'll 
almost always want to configure a witness. 
In addition to being needed for failover, the 
witness prevents what’s known as a split-brain 
scenario, in which the link between the two 
servers gets broken for some reason and each 
one thinks the other is down. In such a case, 
the secondary database will become the pri- 
mary database, and when the link is re-estab- 
lished, both databases think they're primary. 
Not pretty. Having a witness prevents this 
error. You can use any edition of SQL Server 
as your witness, so if you don’t already have a 
free server, you can install SQL Server Express 
somewhere reliable and you'll be fine. 

The principal server instance defaults to 
the server you started the wizard from, so be 
sure to start the wizard from the principal. 
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f » Configure Database Mirroring Security Wizard 


Include Witness Server 


Specify whether to include a witness server in the security configuration. 
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server instances and control the failover. 


@ Yes 
C No 


Do you want to configure security to include a witness server instance? 


To operate database mirroring in synchronous mode with automatic failover, you must 
configure a witness server instance to monitor the status of the principal and mirror 


start mirroring, as 
you can see in 
Figure 6, page 26. 
If you click Do 
Not Start Mir- 
roring, you can 
start mirroring 
any time from 
the Mirroring 
pane of the data- 
bases Properties 
window. How- 


Help | 


<Back [Lre] Finish >>I | Cancel | 


ever, if you wait 
A, too long and the 


> > FIGURE 3 The Configure Database Mirroring Security Wizard 


You also have to start the wizard from the 
right database. The wizard doesn’t have 
a place to pick the database to mirror; it 
assumes you started it from the database you 
want. You can pick your own port and end- 
point name in the Principal Server Instance 
dialog box that Figure 4 shows, but it’s best 
to accept the defaults. The only reason to 
pick something different is if the current set- 
tings are already in use or are restricted. So 
for example, if you already had an endpoint 
named Mirroring, youd have to choose 
another name. Or you might have to choose 
an alternative port if the listener port is in 
use by another service or firewall restrictions 
prevent you from using the default. If you 
don’t have any of these circumstances, there’s 
no reason to change the defaults unless you 
Just want to. 

Next, you choose the mirror instance 
in the Mirror Server Instance dialog box, 
which looks similar to the dialog box in 
Figure 4.Then, the wizard will force you to 
connect, and you'll get to make the same 
choices for the witness in the Witness Server 
Instance dialog box. 

Now, configure the service accounts 
for each server. If you leave these blank, 
as Figure 5 on page 26 shows, it’s assumed 
they're all the same domain accounts or 
are non-domain accounts. I didn’t need to 
configure service accounts for my scenario 
because I’m using the same account on all 
my servers. 

Finally, the Success dialog box tells you 
if anything has gone wrong. After you close 
the box, the wizard asks you if you want to 
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logs get out of 
synch, you'll have 
to apply another log backup and start the 
wizard from scratch. 


Backups 
Now that you have a working mirroring 
setup, you need to know how to manage 
it. Naturally, because the mirror is just a 
standby instance of the primary database, 
you don’t have to back it up. In fact, you 
cant back it up because it isn’t available for 
any kind of connection. So if you have auto- 
mated routines that back up all databases 
on the server, you'll need to exclude the 
mirror or you'll get an error every time the 
job runs. Also, depending on how you have 
your backup job set up, it might not con- 
tinue backing up any of the other databases. 
For example, if you have a stored procedure 
that cursors through sys- 
.databases, backing up each 
database as it goes, you have 
two choices. First, you could 
call the backup statement 
directly from the stored pro- 
cedure, in which case the 
backup will stop on the first 
error. So if the first database 
in the cursor is the mirrored 
instance, the procedure will 
fail and none of the other 
databases will get backed 
up. Your second choice is to 
build the backup statement, 
assign it to a variable, and 


located 


Ustener ports 


f s Configure Database Mirroring Security Wizard 


Principal Server Instance 
Specily information adou tre parve inslance rhaa Iha dalabase was ongmaby 


rnopel server nstance: 


fos sm =] 


Spacy the properties of the endpoint through which the principal server instance wil accept 
Gonnechons from the miror and vatress server instances: 


a a 
NOTE If the principal, mirror or witness are instances on the same server, 
their endpoints must use different ports. 


the context of the stored procedure, and 
although that statement will produce an 
error on the mirrored instance, it will con- 
tinue to move forward and back up the rest 
of the databases. In either case, you'll want 
to exclude the mirrored database from the 
cursor because it will always produce an 
error that you have to investigate. 


Index Maintenance 
and Upgrades 
You also have to understand the effect 
mirroring will have on your maintenance 
routines. If youre running mirroring in 
full safety mode (synchronous), all of your 
REINDEX and DEFRAG statements will 
be applied to both servers in real time. This 
requirement could easily increase your 
maintenance time by 100 percent or more. 
So if you have a strict maintenance window, 
you might want to consider changing your 
mirroring to asynchronous mode before 
you begin maintenance. Let’s look at the 
choices you have for managing mirroring 
with your maintenance routines. 

e Leave in synchronous mode (safety on). 
This choice will at least double your 
maintenance time. 

e Pause mirroring during maintenance. 
This solution is better than using 
synchronous mode if you have a 
strict maintenance window, but once 
maintenance is done and you resume 
mirroring, your mirroring session will 


IF Engrypt data serk through this endpowet 


execute the variable. This Heo | 
choice causes the backup 


statement to run outside 
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> > FIGURE 4 Dialog box for specifying principal server 
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have to resynch. The primary database 
will be available during the resynch, 
but it won't be in synchronous mode 
until the logs get caught up, so you'll 
be unprotected for a time. To pause 
mirroring, run ALTER DATABASE 
against either the primary or the sec- 
ondary server with the appropriate 
setting: ALTER DATABASE Adven- 
tureWorks SET PARTNER SUS- 
PEND. 
Switch to asynchronous mode (safety 
off). The primary database still sends 
its logs to the secondary database, but 
it doesn’t wait for the transactions to 
harden on that secondary server before 
it commits its own transactions. On 
large implementations that support 
lots of activity, this method is preferred 
because you'll still be in asynchronous 
mode while the two servers resynch, 
but the resynch time will be shorter 
than it would be if you paused mir- 
roring. Thus, you'll greatly reduce your 
exposure. To switch to asynchronous 
mode, run ALTER DATABASE 
against either the primary or the sec- 
ondary server with the appropriate set- 
ting: ALTER DATABASE MirrorTest 
SET PARTNER SAFETY OFE 
e Stop mirroring. You also have the 
option of completely stopping mirror- 
ing and re-establishing it once main- 
tenance is complete. This solution isn’t 
practical for most shops, but it is an 
option. To stop mirroring, simply run 
ALTER DATABASE against either 
the primary or the secondary server 
with the appropriate setting: ALTER 


Database Properties 


Do you want to start mirroring this database using the following settings? 


Es Comgure Database Mirroring Security Wizard 


Service Accounts 


Speciy the service accounts of the server instances 


uriisted domain 
DATABASE Mir- 
rorlest SET PART- Sernoe sccourts lot the balkong matane 
NER OFE Binapat Witness 
E ESS 
The same considerations ~ 


come into play when 
you have schema changes. 
Especially when you're 
using mirroring, you 
must test schema changes 


Il the terioa sccounts a diferent, the witerd wii create logers loi the accounts |F necessatyl 
and grant CONNECT permersons on the endpomnts tor each accourt 


E fhe server matances use diferent eccounts in the same of trusted doman es ther service 
sccourts joi SOL Server. enter the accounts below. Leave the textbowe: amply # all evetancer: 
ure the same account, the accounts are nor-doman accounts, of the accounts are in 


before you implement 
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them because some 
changes can cause massive 
log activity and you'll need 
to benchmark ahead of time. For example, if 
you add a column to a large table and fill it 
with default values, you could double your 
processing tume and the upgrade time. For 
a more detailed discussion of database mir- 
roring performance considerations, see the 
Microsoft article “Database Mirroring Best 
Practices and Performance Considerations” 
at http://www microsoft com/technet/ 
prodtechnol/sql/2005/technologies/ 


m t_pract.mspx 


Automated Monitoring 

You'll also want to monitor your mirroring 
setup to make sure your secondary server 
isn’t getting too far behind if you're running 
in asynchronous mode (safety off). To do so, 
you can use the monitoring GUI that is 
included in SQL Server 2005 Management 
Services, Service Pack 1 (SSMS SP1). This 
GUI will give you real-time statistics about 
your mirroring setup. Im not going to 
go into details about the SSMS GUI here 
because most DBAs don’t have the time 
to monitor a mirroring scenario for very 


> > FIGURE 5 Dialog box for configuring service accounts 


long. Instead, I recommend using the Perf- 
mon counters. You can add these counters 
to existing monitoring solutions such as 
MOM, OpenView, or Tivoli and set alert 
thresholds, as you would with any counter. 
You'll notice a new object in Perfmon: SQL 
Server: Database Mirroring. In the object, 
you'll find several interesting counters that 
display statistics about how many bytes get 
pushed between the two servers. The two 
counters that will be most useful to you in 
benchmarking and diagnosing performance 
issues are the Log Send Queue and Redo 
Queue counters. The larger these numbers 
get, the further behind your mirror is get- 
ting, and you should look into the root 
cause of the latency. 


The Bottom Line 

Mirroring can be a powerful tool in your 
high-availability scenario if you pay atten- 
tion to its limitations and benchmark and 
monitor it properly. Mirroring is easy to 
set up and provides zero data loss under 
the right conditions. And although it can 
complicate maintenance scenarios, it’s easier 
to support than replication or clustering. 
Mirroring can also be an attractive alterna- 
tive to the often-bloated costs of third-party 


Principal network address: TCP;//datacon.Kiddo,Poops:5022 

Mirror network address: TCP: }/data2con, Kiddo, Poops:5022 
Witness network address: TCP:/{darthcon.kKiddo,.Poops:5022 
Operating mode: High safety with automatic Failover (synchronous) 


applications. You can download the T-SQL 

scripts you need to set up mirroring and 

some common management tasks at_http:// 

swowsqlmagcom. Click Download the 
Code at InstantDoc ID_95293. SOL 

InstantDoc ID_95293 


Tf you do not start mirroring now, you can start it later by clicking Start Mirroring on the Mirroring page of 
the Database Properties dialog box. If you close the Database Properties dialog box without starting 
mirroring, the server network addresses will not be saved, (Note that any security configuration 
performed by the Configure Database Mirroring Security Wizard will be saved.) 


> > FIGURE 6 Screen showing that mirroring setup is ready 
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Combat SQL Server sprawl 


ow many database servers do you have in your organization? Three? Ten? Sixty- 

five? Database modelers, DBAs, security administrators, and CxO managers 
all speak to the problems inherent in a “server sprawl” condition. Many DBAs are 
plagued with the all-too-common scenario of one database per server; are your own 
servers struggling to beat the average server utilization rate of 7.8 percent? When you 
hear the phrase “eliminating SQL Server sprawl,” chances are the speaker is talking 
about server consolidation, but database sprawl is just as pervasive a problem—and 
consolidating databases isn’t as easy as it sounds. Consolidation implies change, and 
any kind of change to a production or development environment has the potential 
to disrupt operations. 


Physical and Logical Consolidation 
You can perform database consolidation at both the physical and logical levels. At the 
physical level, you accomplish consolidation by simply relocating a database from one 
server to another until the target server has reached some predetermined utilization 
level—say, 75 percent of CPU utilization (which leaves some room for database 
growth). This is the first type of consolidation that most DBAs undertake; the risk of dis- 
ruption is minimal because you're moving the database in its entirety, without changing 
anything at the logical level. 
The gains in server utilization 
and the cost savings that result 
from the reduction in hard- 
ware can be substantial. 
Database consolidation at 
the logical level integrates mul- 
tiple databases, which all sup- 
port one common function, 
into a single database. This pro- 
cess can challenge the planning 
and design skills of even the 
best data modelers and DBAs, 
but when executed correctly 
it will ensure a significant 
ROI in the areas of schema 
management, data integrity, 
and enhanced performance. 
Of course, whichever type of 
consolidation you choose to 


> > FIGURE | Before consolidation 
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practice, physical or logical, you’ll have 
to make changes to each of the separate 
application UIs. 

Picture this: It’s your first day on 
the IT staff of a commercial construc- 
tion company—you're the only DBA 
and you’ve just found out about the 
company’s project management system. 
The system is custom-built because it 
has some special needs that the commer- 
cial software packages don’t address, and 
youre responsible for its care and feeding. 
This project management system is com- 
posed of 26 Microsoft Access databases, 
one for each current construction project 
that the company has on the books. Each 
database contains lists of tasks required for 
the project, indicated by the name of the 
Access database. Each contains separate 
application UIs by which data is added, 
edited, and deleted. In addition, there are 
four “reference” Access databases that all 
the project databases use. The accounting 
system, which ties into all the “reference” 
databases, is hosted on SQL Server. 

For many reasons, your boss wants 
to migrate the Access databases into 
SQL Server, and he wants to consolidate 
the many project-management databases 
into just one. These project databases 
are schema clones; all 26 Access data- 
bases share a common structure and 
UI. Where they differ is content. One 
project database will contain informa- 
tion about a high-rise being built in the 
downtown area, and another will contain 
information about a new building at the 
university. Every time the end users (e.g., 
project managers, project coordinators) 
request a new feature, the schema changes 
and UI application modifications must be 
made 26 times—once for each of the 26 
individual project databases. 


Michelle A. Poolet (mapoolet@sqlmag.com) is 

a contributing editor for SQL Server Magazine and cofounder of 
Mount Vernon Data Systems LLC who teaches database design and 
programming. She has a Masters degree in Computer Information 
Systems and is an MCP. 
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Start at the Top 


In this case, you’d want to start at the top, 
with a system diagram, similar to the one 
that Figure 1, page 27, shows. The system 
diagram lays out the major components as 
they exist at the start of the consolidation 
project. The individual project-manage- 
ment databases appear in color and are 
labeled Project A through Project Z. Each 
of these project databases has links to—and 
uses data stored in—the “reference” data- 
bases: People, Contracts, Inventory, and 
Human Resources. In turn, these four data- 
bases feed data into and get data from the 
Accounting database. The actor symbols on 
the left side of Figure 1 represent the field 
users—project managers, project coordina- 
tors, and field supervisors—and the actor 
on the right represents the office staff that 
gathers statistics and generates reports. Each 
actor symbol represents about 10 people. 

To consolidate all these databases, you'll 
want to analyze the schema of each of the 
databases. Each one of the databases that 
you see in Figure 1 is composed of a set 
of interrelated tables. If there’s any overlap 
between tables in business meaning and 
data, this overlap should be resolved with 
some careful redesign. Referential integ- 
rity that currently doesn’t exist will have 
to be established when all these table sets 
are relocated into one database. There’s a 
high probability that you'll be defining 
new keys—primary and foreign—and new 
indices to enhance performance. Also, don’t 
forget to take into account all the appli- 
cation UI modifications that'll be made 
because of the consolidation. 


Where’s the Master 

Project Table? 

There is no master list of project names in 
this scenario, so the first new object you'll 
want to create is a Master Project table. 
Figure 2 is a high-level graphical repre- 
sentation of the SQL Server database after 
you've converted all the Access tables into 
SQL Server tables and migrated the Access 
data into SQL Server. The Master Project 
table will contain a project identifier or 
number, a project name—such as “Hos- 
pital” or “University Building”—a project 
start date, the estimated project duration, the 
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actual project dura- 
tion, and any other 
data that’s pertinent 
to the project. 

Each table con- 


tains data for all 
the projects, so it’s 
conceivable that— 
depending on the 
number of rows | 
within each table— PRO 
you might want to 


COUNTING 
TABLES 


INVENTORY 
TABLES 


| HUMAN 
RESOURCES 
| TABLES 


consider horizontal 
partitioning. Fin- 
ished projects could 
be moved into a set of tables that comprise 
an archive or history of projects completed. 
Extremely large or long-running projects 
could have their own set of project tables. 
Retrieving data from projects that share 
space in the project tables could be opti- 
mized by creating materialized views. 

In Figure 2, the set labeled People Tables 
are those tables that were once part of the 
Access People database. Likewise, the sets 
named Contract Tables, Inventory Tables, 
and Human Resources Tables all corre- 
spond to the Access databases of the same 
name. The Accounting Tables, if you recall, 
were originally stored in SQL Server. I 
haven't indicated any relationships between 
these sets of tables, but you can rest assured 
that there will be many. 


Cleaning Up 

Importing all this disparate database con- 
tent into one SQL Server database is going 
to mean a lot of maintenance for the 
Uls—almost every one, with the excep- 
tion of the UIs for the Accounting system, 
will have to point to a new data source. In 
the Access version of the project databases, 
each project database had links to other 
Access databases, and correct behavior of 
the Uls was dependent in part on ODBC 
connections. This scenario will change; 
all UIs will now look to the SQL Server 
database as their data source. Depending 
on your company’s philosophy, you might 
want to wait until the back end (the data- 
base) is stabilized before you start restruc- 
turing the Uls. Eventually, you’ll want 
to rewrite the UIs, replacing the Access 
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> > FIGURE 2 After consolidation 


client-server scheme with something that’s 
Web-enabled, and incorporating additional 
features and functionality made possible 
by the database consolidation. UI mainte- 
nance should become significantly easier 
with this move; you'll no longer need to 
update 26 individual project computers 
with each change to a table or a UI. Project 
teams in the field will be able to access data 
from handheld devices (if that’s a direction 
the company wants to move in), rather 
than having to haul around computers and 
synchronize .mdb files every night. Con- 
solidating databases, in this case, has opened 
up a world of opportunities. 


Worth the Time 

Taming sprawl through database consolida- 
tion is an approach that many companies 
are starting to evaluate. Consolidating data- 
bases isn’t as easy as it might seem at first, 
but careful planning and some judicious 
redesign can enhance the usefulness of the 
databases and availability of the data. One 
unavoidable chore that is a result of data- 
base consolidation will be modifying and 
possibly upgrading the UIs. However, con- 
solidating databases, especially from isolated 
desktop environments to a centralized SQL 
Server infrastructure, can offer the potential 
for increased data availability and new ways 
of doing old tasks. 

You can post your thoughts and con- 
cerns about database consolidation, and its 
potential for business efficiency, at the SQL 
Server Magazine Database Design forum, 


http://www.sqlmag.com/go.dbdesign. Er 
InstantDoc ID 95405 
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Gather information automatically 


uppose that you hire a new DBA and assign him or her to a 

large-scale SQL Server infrastructure that consists of more 
than 100 servers divided not only by purpose and criticality but 
also by geography. This new DBA would likely spend the entire 
first week getting to know the lay of the land—using SQL 
Server Management Studio (SSMS) to connect to each SQL 
Server machine, one by one, to gather essential information 
such as version, edition, server configuration, existing databases, 
and scheduled backup jobs. A daunting task indeed. 

I learned early in my career that spending time up front 
to automate otherwise manual and time-consuming tasks 
can preserve your sanity. I therefore developed a fairly simple 
solution that connects to each available SQL Server machine, 
pulls information into a central repository database, and feeds 
the combined data to a report for DBAs 
and other IT staff to use. In this article I 


Download the Web listings at 
InstantDoc ID 95385 and read 
the Web-exclusive sidebar at 


tw. 


InstantDoc ID_95387 


glmag.com 


SSS 


describe the solution I used. (Note that the solution I offer is 
intended to enhance, not replace, a preexisting full monitoring 
and alerting solution.) 


Tool Time 

Several tools are available to the intrepid DBA setting out to dis- 
cover the uncharted server landscape. For example, you can use 
the Microsoft SQL Server Health and History Tool (SQLH2; 
available at _http://www.microsoft.com/downloads/details 


.aspx?familyid=eedd10d6-75f7-4763-86de-d2347b8b5f89&di 
splaylang=en) to populate a repository database. However, this 


tool is outdated and not very flexible. (For more information 
about SQLH2, see “SQL Health and History Tool,” November 
2006, InstantDoc ID_93544.) 

For my project, I employed the command-line tool 
Sqlcmd, SQL Server Integration Services (SSIS), a SQL Server 


database for the repository called DBA_ 
Rep, and SQL Server 2005 Reporting 
Services (SSRS). (For more information 
about Sqlcmd, see the Web-exclusive 
sidebar ““Sqlcmd,” http://www.sqlmag.com, 
InstantDoc ID 95387; for related articles 
about SQL Server 2005 and SSIS, see the 
Learning Path.) In this article, I explain 
how to use Sqlcmd and SSIS to construct 
and populate the DBA_Rep repository 
database. In a later article Pl explain how 
to build and deploy the three SSRS reports 
designed to deliver the data from this 
repository. (Note: This article makes several 
nonstandard recommendations, such as 
querying system tables directly, employing 
pseudo temp tables, and using xp_cmdshell 
to run commands. If you stringently adhere 
to best practices and use only supported 
techniques, you'll need to withhold your 
judgment temporarily—until you see that 
the nonstandard methods I use are efficient 
and aren’t detrimental.) 


Create the Repository Database 
Now that you have a list of servers to use as 
input, you might wonder how you can use 
that input directly in an SSIS package. But 
don’t get ahead of yourself—first, you must 
store the information somewhere. As most 
DBAs know, the best place to store a list 
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of data is in a table. 
Before we examine 


the SSIS package, ee eee 


let’s take a look at I 


the database that will Yj teom ee 


be the repository for 
the SSIS load. 

The table that [> 
will store the list |- 
of servers on the | 
network from the |7 
Sqlcmd command 
is called ServerList_ 
SSIS. In addition to 


this table, six other 
base tables in the 
DBA_Rep database 
store data to give a DBA enough basic 
information at a glance to answer some fun- 
damental questions about the state of each 
server. These tables are SQL_Servers, Data- 
base_Info, Databases, Backup_History, Jobs, 
and Jobs_Archive. Each of these six tables 
holds specific information about each SQL 
Server instance. Web Listing 1 (http://www 
sqlmag.com, InstantDoc ID 95385) contains 
each table’s schema; this listing also serves as 
the script to build the database for the SSIS 
package to populate the database. To run 
this script successfully, you need to create 
a blank database called DBA_Rep in your 
SQL Server 2005 instance. After you create 
the DBA_Rep database, the script that Web 
Listing 1 contains will create the necessary 
tables in the database. 

The table ServerList_SSIS is empty, 
waiting to be filled with server names. 
Although I didn’t automate this process, you 
can use SSIS techniques similar to the fol- 
lowing techniques that I discuss to easily do 
so. I used the special stored procedure xp_ 
cmdshell to run the Sqlcmd shell to return 
the server list. Assuming that you’ve run the 
script to create the DBA_Rep database that 
contains the ServerList_SSIS table, you can 
easily use an Insert Into statement with the 
Sqlcmd /Lc command to insert the server 
data. In SSMS, open a new query window 
and enter the following command: 


USE DBA_Rep 
GO 


Insert Into ServerList_SSIS 


SQL Server Magazine 


> > FIGURE | Full package to populate the repository database 


Exec xp_cmdshell 'sqlcmd /Lc' 


At this point you need to consider the 
following caveats. First, the result set returns 
NULL records. The table can accommodate 
NULL records, and the SSIS package’s logic 
will in turn filter out these rows. You could 
build in logic to take care of the NULL 
values before the insertion, but I chose to 
do it as part of the SSIS package. Because 
this table has no defined indexes that require 
unique values, truncating the table also 
ensures that no duplicate rows occur each 
time the Sqlcemd /Le command loads the 
table. You also need to ensure that xp_cmd- 
shell is configured with the proper permis- 
sions to execute. By default, xp_cmdshell is 
disabled in SQL Server 2005; you can use 
the SQL Server Surface Area Configuration 
tool to enable xp_cmdshell. 


Obtain the Data 
The information that’s most important to 
DBAs on a day-to-day basis is server infor- 
mation, such as version (e.g., SQL Server 
2005 or 2000), service pack level, edition (Le., 
Standard or Enterprise), and default collation. 
Fortunately, SQL Server 2005 and 2000 store 
this information in handy system tables. My 
solution queries each server and stores this 
information in the SQL. Servers table. 
Database information is just as impor- 
tant or even more important than server 
information; crucial database information 
includes the two tables Databases and Data- 
base_Info. The Databases table simply holds 
the name of the server and the name of the 
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database. Database_Info holds details, such 
as size, file location, and recovery model 
for each database. You can use the Server 
field to join the Database_Info table to the 
Databases table. (For ease and simplicity, I 
used the Server field for joins rather than a 
key field or a server ID field that I would 
need to manually update. Although this 
solution wasn’t ideal, it was sufficient for my 
purposes.) 

The final three tables store information 
about SQL Server Agent jobs and database 
backups, which I believe is the most impor- 
tant information for any DBA to have. For 
example, knowing which jobs are failing 
and need attention is imperative when 
you're working with hundreds or even tens 
of database servers and databases. Jobs often 
fail—and because most jobs perform routine 
full and transaction log backups if they fail, 
response must be swift. The Backup_His- 
tory table holds detailed information about 
backups that occurred in the past n days. 
I’ve found that 5 is a good number of days 
of history to analyze. 


Populate the Repository 

Now that you have the list of SQL Server 
machines, and the repository database is 
defined to store information about these 
servers for databases, jobs, and backup his- 
tory, it’s time to use SSIS to populate this 
simple repository. SSIS might seem like a 
complex design environment if you've never 
used it. Many DBAs use DTS for SQL 
Server 2000 to cut their teeth for extraction, 
transformation, and loading (ETL). 

Figure 1, page_30, shows the full package 
that I used to populate the repository data- 
base. This package consists of three areas: 
(1) migrating and/or truncating repository 
tables to maintain the repository, (2) popu- 
lating a variable with an ADO record set of 
server names derived from the command- 
line utility Sqlcmd, and (3) using this variable 
to programmatically connect to each server, 
one by one, and pull information into the 
repository. You can download this solution 
from SQL Server Magazines Web site. (Go 
to_http://www.sqlmag.com, enter_95385 in 
the InstantDoc ID text box, then click the 
95385.zip hotlink.) The solution is called 
DBA_Server_Load and the SSIS package it 
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contains is called Populate_DBA_Rep.dtsx. 
You can use Business Intelligence Develop- 
ment Studio (BIDS) to open the solution 
and package. 

Truncating tables and migrating data to 
archive tables occur first in the SSIS package. 
The Execute SQL Task objects that run 
the Truncate Table statements are grouped 
together in a sequence 
container at the top of 
the package. When the 
package runs, all the tables 
are initially truncated; the 
only exception is the Jobs 
table. Before truncating 
the Jobs table, a Data 
Flow task moves the data 
from the Jobs table to 
the Jobs_Archive table. I 
wanted to maintain a his- 
tory of job successes and 
failures to analyze over 
time. The other tables 
need the most current 
data—and in my opinion, 
starting fresh each time 
for this semi-static infor- 
mation is cleaner. As I P3 
mentioned previously, 
Pm pulling 5 days’ worth 
of backup history that 
will repopulate with each 
run. Figure 2 shows the 
dialog box to configure 
the Execute SQL Task 
object to truncate the 
SQL_Servers table. When 
all the objects in the Tiun- 
cate Tables and Populate 
Archives sequence con- 
tainer complete success- 
fully, the package moves 
to the second phase, 
which is to populate a 
variable with an ADO 
record set. 

Before I explain how 
to populate a variable 
from a SQL Server query 
to use within an SSIS 
package, let me explain 
why you might want to 
do so. If you have fewer 
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Cortegure the properties recpared te run SOL staternents and stored procedat unng the selected 


than 10 servers, creating a separate con- 
nection to each server might be feasible. In 
SSIS you'd have 10 Connection Managers, 
each pointing to one SQL Server machine. 
More than 10 servers is problematic, but a 
tenacious DBA might be willing to create 
separate connections for, say, 20 servers—as 
long as no additional servers are expected, or 
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FIGURE 2 Configuring the Execute SQL Task object to 
truncate the SQL_Servers table 
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FIGURE 3 Configuring the Execute SQL Task object for 
the Populate ADO Variable task 
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Figure 3 shows the 
dialog box to con- 
figure the Execute 
SQL Task object 
for the Populate 
ADO Variable 


System.Object 
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[variables variables |3 


> > FIGURE 4 The SRV_Conn and SQL_RS variables 


task, including the 
Connection, SQL- 
SourceType, and 
SQLStatement prop- 


EE Execute SQt Task Editor 


Config.re the properties required to run SQL statements and stored procedures using the selected 
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erty values. A direct 
input query to the 
DBA_Rep database 
resides on the local 
SQL Server machine 
named QASRYV, 


> > FIGURE 5 Setting the result name and variable name 


which is also where 
the package will run. 


To use the query 
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the package, which 
uses Foreach Loop 
container objects. 
The five Foreach 
Loop containers 


(.e., one for each 


> > FIGURE 6 Setting the enumerator and ADO object 


source variable 


the DBA is willing to manually add servers 
and maintain the package indefinitely. In 
my case, I had more than 100 servers, so I 
needed a better solution. 

The task to populate the variable uses 
a simple SELECT statement to query the 
ServerList_SSIS table containing the server 
names previously derived from Sqlcmd.The 


query is: 
SELECT RTRIM(Server) AS 
<servername> 
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database table in the 
DBA_Rep database) 
employ a combina- 
tion of techmiques to meet the defined goal 
of programmatically changing a connection 
string iteratively for each server and executing 
a Data Flow object to retrieve server-specific 
information. 

To open the Variables toolbar, right-click 
anywhere in a blank area of the package 
and select Variables from the list. Figure 4 
shows the Variables toolbar with two defined 
variables of two different data types. The 
first, SRV_Conn, is a simple string variable. 
The second, SQL_RS, is an object data type. 
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These distinctions are important. Because 
the result set from the SELECT statement 
contains multiple records, a string variable 
doesn’t work. I needed to use the SQL_RS 
object with a value of System.Object to 
hold the results, then map the two variables, 
object to string, in the Foreach Loop con- 
tainer. I used the following four simple steps 
to accomplish this task. 

1. Under Result Set in the Populate ADO 
Variable task, set the Result Name to 0 and 
the Variable Name to User::SQL_RS, as 
Figure 5 shows. 

2. In each Foreach Loop container, set 
the enumerator in the Collection area to 
Foreach ADO Enumerator, and set the 
ADO object source variable to User:: 
SQL_RS, as Figure 6 shows. 

3. Under Variable Mappings in the 
Foreach Loop containers, set the Variable to 
User::SRV_Conn and the Index value to 0, 
as Figure 7 shows. 

4. Use a property expression in the Con- 
nection Manager object to assign the string 
variable to one Connection Manager.When 
the Foreach Loop container executes the 


Í] Foreach Loop Editor 


f The Foreach Loop container allows execution iteration over an enumeration. 


General 
Collection 
Variable Mappings 
Expressions 


Select variables to map to the collection value, 


Variable [inter | 
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Data Flow tasks it contains, the connection 
string is dynamically built with each enu- 
meration of servername. In this package, the 
Connection Manager called MultiServer 
serves this purpose. Setting the variable to 
the ServerName property, as Figure 8, page 
33, shows, causes the connections to set 
themselves correctly for each server. 

After the variable mappings are in place, 
you can use Data Flow objects within each 
Foreach Loop container to load the tables. 
You can place Data Flow objects on the SSIS 
package’s Control Flow tab, but these special 
objects have their own tab on which you can 
define their properties and sequencing. In 
general, a Data Flow task consists of a source 
and a destination object. In my solution, 
both the source and destination are OLE 
DB connections to a SQL Server machine. I 
configured the source to use the MultiServer 
connection that would enumerate through 
the list of servers, and I configured the local 
DBA_Rep connection as the destination 
to hold the data. Source and destination 
columns are mapped together. The source 
can be an object, such as a table or view, or 
as in my package, it can be a SQL query to 
be used as a derived table. 

For four of the five tables, I sent one 
query to select values and loaded the results 
from the remote servers into the local 
DBA _Rep database. To examine the tables’ 
source queries, right-click the Data Flow 
object within the Foreach Loop container 
and select Edit. Then, on the Data Flow 
table, right-click the OLE Data Source 
object and select Edit again to display the 
source query. Web Listing 2 contains the 
query to load the Databases table. 

I needed to handle the Database_Info 
table, which loads detailed information 
about each database, a bit differently. Because 
the SQL Server 2000 Master database stores 
basic information about each database in 
the sysdatabases tables, and each database 


stores the remaining important information, 
I needed to query each database individu- 
ally. I could have used the stored procedure 
sp_MSForEachDB, which has been available 
since SQL Server 7.0, to easily accomplish 
this task. However, using this stored pro- 
cedure doesn’t return a solitary result set. 
I needed a full result set, so I considered 
other alternatives. Using temp tables or table 
variables would have given me the full result 
set I needed, but setting up and maintaining 
temp tables is difficult and requires special 
considerations. For example, you need to 
create the temp table beforehand, and you 
must set a value to retain the connection. 

My solution was to query each database 
to return database-specific information in a 
table that I created in the TempDB database. 
The table I created wasn’t a true temporary 
table with a # or ## prefix. Although the 
table resides in the TempDB database, its 
size and location have minimal effect on 
the source server. Web Listing 3 contains 
the code to create and populate this table, 
called HoldForEachDB. Notice the syntax 
of the sp_MSForEachDB stored procedure, 
which substitutes a question mark for the 
database name. This command is fairly 
useful, without requiring you to wrap logic 
into cursors to provide similar functionality. 
(Note: When you save the task, you might 
receive a message such as “The DROP table 
construct statement is not supported.” How- 
ever, the query will still delete the table and 
repopulate it with crucial database informa- 
tion such as status, recovery model, size, and 
updateability.) 


Troubleshooting 

The package runs on both SQL Server 2005 
and 2000 servers. If no errors generate, the 
package will run in less than 2 seconds for 
2 servers and in less than 2 minutes for 30 
servers with varying amounts of data. If the 
package runs successfully and each package 


object turns green, you're ready to query 
the repository with custom queries and 
reports. Even if the package objects don’t 
all turn green, don’t worry—each task has 
a maximum error value that you can set to 
let the package continue running even if 
it can’t connect to an individual server, for 
example. 

Servers fail to connect for various rea- 
sons—for example, they might not be 
online, or the user running the package 
doesnt have access. You can use event han- 
dling to send an email message or write to a 
log to report server failures. Event handling, 
logging, and scheduling the SSIS package for 
automatic runs are beyond the scope of this 
article. For more information about these 
tasks, see SQL Server Books Online (BOL). 

After you load the database, you can run 
a simple query such as the following: 


SELECT [Server] 
,LProductVersionJ 
,LProductLevelJ 
,LisIntegratedSecurityOnly] 
,LEngineEdition] 
,LCollation] 
,LisClustered] 
,LisFullTextInstalledJ 
,LSqlCharSetJ 

FROM CDBA_Rep1.Cdbo1.CSQL_ 
Servers] 


This query provides the data that Table 1 
shows, which includes useful information 
about the two servers I used to test the 
package. 


You've Got the Power 
The package I created is just one example 
of what you can do with SSIS. Combining 
the variable and expression values gives you 
unprecedented control and efficiency in 
designing packages. For my package, I used 
variables and expressions to programmati- 
cally control a connection string to query 
multiple servers without needing to 


TABLE | Server Information manually define each connection. EM 
. - = - InstantDoc ID_95385 
Server Name Version Level Security Edition Collation Clustered | Full-Text 
UAFCQASRV4 8.00.2039 SP4 NULL Enterprise SQL_Latin1_General_ Not Full-text is 
Edition CP1_CI_AS clustered installed Rodney Landrum ( Le 
UAFCQASRV5 9.00.2047.00 | SP1 Not Standard SQL_Latin1_General_ Not Full-text is _net) is a senior SQL Server DBA and author living in 
integrated Edition CP1_CI_AS clustered installed Pensacola, Florida. His most recent book is Pro SQL Server 
security 2005 Reporting Services (Apress). 
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Are you ready? 


0 ne of the biggest trends I’m currently 
seeing with my clients is the desire for 
a consolidated SQL Server environment. 
However, just because my clients think they 
need to consolidate doesn’t mean they're 
ready to do so, or even necessarily that they 
should. Before you consider implementing a 
consolidated SQL Server environment, you 
must assess your existing environment to 
determine whether planning a consolida- 
tion is even possible—or worth your time. 
(Because the analysis and assessment can be 

time consuming, you 
e oniihe WEB might want to ae 
Read the Web-exclusive 


sidebars at InstantDoc IDs ¥ help; see the sidebar 
95465, 95466, and 95467 “Getting Help” for 


i ` 


solidation 


more information.) This article outlines the 
steps you need to take to gather the infor- 
mation necessary for making such an assess- 
ment. The solution that I discuss involves 
workload consolidation; for an alternative 
consolidation solution, see the Web-exclu- 
sive sidebar “Different Paths to Server 
Consolidation: Workload vs. Virtualization,” 


http://www.sqlmag.com, InstantDoc ID 
95465. 


Driving Factors 

Consolidation tends to be a top-down 
rather than bottom-up business decision. 
In most cases, upper management wants 
to do more with less and hopes to recover 


Analyzing and assessing your environment can be a daunting task. Don’t be 
afraid to hire someone to assist in the process, especially if you're already over- 
loaded with day-to-day duties—the money will be well spent. DBAs often work 
extra hours just to stay above water; an expectation from management that 
you can complete a consolidation assessment, analysis, and proposal in a short 
amount of time (which is usually how the request is made) is unrealistic. 
Hiring someone to help with your consolidation project can have other ben- 


efits also. Someone from the outside might be able to gain access to resources 
such as application or business owners that a DBA doesn’t typically have access 
to, giving the final analysis more points of information. In addition, an outside 
party is likely to be impartial and can be objective in creating the final document 
that specifies what you can consolidate and what must remain as a standalone 
installation. 
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some costs. Thus, consolidating SQL Server 
is only part of a much larger transformation 
within a company. Another case in which 
a company might push for SQL Server 
consolidation is if its data center is at or near 
capacity and building a new data center isn’t 
an option. However, the biggest driver for 
consolidation that I see on a regular basis is 
a situation you might be familiar with: SQL 
Server sprawl. 

SQL Server has always been a Trojan 
horse in many environments—it sneaks in 
through the back door and becomes perva- 
sive because it’s so easy to install and deploy 
whenever an organization needs a database 
server. You can install SQL Server on every- 
thing from mobile devices through servers. 
A side effect of this phenomenon is that 
it leads to a mentality of one database per 
instance, with that solitary database housed 
on a standalone server that’s typically under- 
utilized and/or has poor availability. In some 
cases, a company’s lone SQL Server machine 
is a desktop machine that was never meant 
for production. But because people started 
using the instance and liked the solution 
built on top of the database, that desktop 
machine became mission critical. 

These days, you might be able to gain 
more significant performance increases for 
your SQL Server installations with a shelf 
full of blade servers than you can with a 4U 
server you purchased only three years ago. 
Because many companies house terabytes 


PREY 


QUIRED TPA 
READING 


May 2007 35 


REQUIRED 
[P REQUIRED i 


SQL Server Consolidation 


of data that require management, you need 
to carefully consider everything from hard- 
ware purchases to operations. Consolidation 
means more than just stacking multiple 
SQL Server instances or putting multiple 
databases in one instance. You need to take 
into account the amount of cooling needed 
to ensure that the servers don’t overheat, as 
well as the overall energy requirements for 
the multiple servers in your data center. 

To do consolidation right, you'll most 
likely need to buy new servers. Assuming 
that you can use your existing servers might 
be a bad idea. If you reconfigure the servers, 
you won't have a true fallback plan after you 
consolidate. Instead, consider repurposing 
existing hardware to refresh development, 
quality assurance (QA), or staging environ- 
ments. As you assess your consolidation 
strategy, you must decide how much and 
what type of new hardware you need and 
determine the cost. 

In addition, ask yourself whether each 
instance in your environment was deployed 
and is administered in the same manner. 
In my onsite experience with both large 
and small customers, not a single one has 
consistently deployed SQL Server in exactly 
the same manner—despite the fact that each 
organization has posted standards. Managing 
these environments is a nightmare for DBAs. 
If each instance has different sort orders, ser- 
vice pack levels, database options, configura- 
tion settings, and other differences, you must 
remember all the variations or you might do 
something on one server that would be det- 
rimental to another. Consolidation can assist 
in solving these problems; in fact, it’s often 
the catalyst to standardize deployments and 
other paradigms (e.g., operations, processes, 
administration) in an environment. 


What You Don’t Know 
Will Hurt You 


The first and perhaps most important thing 
you need to begin consolidating your envi- 
ronment is information. You need more 
than just a day’s worth of Performance 
Monitor statistics, because a day’s worth of 
counters won't give you enough informa- 
tion on which to base decisions for a pro- 
posed consolidation. In addition, numbers 
tell only part of the story. 
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First, compile a complete list of the 
known servers or other machines on 
your network with SQL Server installed. 
Although most chents think they know their 
entire SQL Server inventory, they actually 
don't. After you have this list, use a third- 
party tool (e.g., Quest Software’ Reporter) 
or a homegrown utility (using Windows 
Management Instrumentation—WRMI) to 
query all of the machines on the network 
that have any kind of SQL Server resource 
(including products such as Microsoft SQL 
Server Desktop Engine—MSDE). Then, 
compare the lists to see whether any new 
SQL Server installations were discovered. If 
the lists don’t match, you need to determine 
who owns the newly discovered instances, 
what theyre used for, and why you didn’t 
know about them. 

Next, record all aspects of SQL Server 
(e.g., physical machine information such as 
processor type and speed, brand, and model; 
all OS settings, including driver versions, 
service packs, and patches; instance configu- 
ration, including all server-level options and 
settings, code page, security, jobs, and DTS or 
SQL Server Integration Services—SSIS— 
packages; database configuration, including 
database options, database size, amount of 
free space, and size of each file). Record 
this information for every discovered SQL 
Server machine and instance. This detailed 
information will ultimately help you deter- 
mine which databases you might be able to 
combine with others on a specific instance 
of SQL Server, as well as which databases 
will work well together. The information 
you document will influence many aspects 
of your consolidation assessment and plan, 
including your decision regarding the final 
instance standard (e.g., the version of SQL 
Server, which forms of high availability you 
might deploy). The documentation will also 
serve as a reference if you move a database 
and someone needs to know where it came 
from and whether you moved, reconfigured, 
or recreated objects such as jobs, DTS or 
SSIS packages, and replication. This might 
be the only chance to capture a complete 
snapshot of your SQL Server environment. 

Although most DBAs don’t track his- 
torical information about each database, you 
need this information (e.g., database usage, 
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SQL Server Consolidation 
Planning Checklist 


e Consider your organization’s 
motivations for consolidation. 

e Gather pertinent data about your 
environment, such as SQL Server 
installations, database configuration 
and administration information, 
historical information about each 
database, and performance statistics. 

e Gather information about your 
applications, such as security model, 
tested SQL Server version, and 
dependencies. 

Assess your environment for 
readiness in implementing a 
consolidation solution, including 
organizational preparedness and 
financial resources. 

e Create a document that outlines 
your proposed consolidated 
environment. 


growth patterns—including size of database 
backups). For example, if a database has 
been in production for two years, knowing 
its growth pattern can help you predict the 
kind and amount of storage space you might 
need in the future. Merely making best guess 
estimates isn’t an effective method for sizing 
an environment. If you don’t have accurate 
information, you might wind up under- 
sizing or grossly oversizing your proposed 
environment. Undersizing means you'll run 
out of capacity quickly, possibly even before 
you finish the consolidation; oversizing 
might cost your company money that could 
have been better spent elsewhere. 

You can’t turn back the clock and create 
information that doesn’t exist, so do your 
best to track down the information you 
need. If you're lucky, the information might 
exist elsewhere, such as in a central moni- 
toring system that has been monitoring disk 
counters for two years but that you didn’t 
have access to. Talk to your systems adminis- 
trators to see if they have information about 
your database servers that you don’t have. 

In addition to having growth informa- 
tion, you need to understand each database, 
instance, and servers performance. Your 
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company most likely has a centralized tool 
such as Microsoft Operations Manager 
(MOM), NetIQ AppManager, or Com- 
puter Associates’ (CA5) Unicenter that’s 
gathering these statistics. If not, you can use 
Performance Monitor counters to find this 
information. Before you set up your own 
monitoring, make sure you aren't duplicating 
any efforts. Although you could gather sta- 
tistics from every available counter, doing so 
would be overkill—you'd have a volume of 
information that you wouldn’t know what 
to do with. Instead, gather information from 
a select list of counters that gives you the 
biggest bang for your buck. The Web-exclu- 
sive sidebar “Using Performance Monitor 
to Record Counters,” hhttp://www.sglmag 
com, InstantDoc ID_95466, explains how to 
obtain the information you need. For SQL 
Server counters, you can also query SQL 
Server 2005’s Dynamic Management Views 
(DMVs) or SQL Server 2000's sysperfinfo. 

The statistics you need to pay attention 
to on the OS level are CPU utilization (for 
the server overall, as well as for each SQL 
Server instance; also, per processor, as well 
as the sum for all processors), physical disk, 
network throughput, and memory utiliza- 
tion (for both the server itself and for SQL 
Server). What to capture for SQL Server is 
a bit more subjective because you might 
want to monitor specific information for 
your organization. Two common pieces of 
information to capture are transactions per 
second (tps) for each database and the total 
number of user connections. In an assess- 
ment, you're most worried about informa- 
tion that gives you a specific result. In this 
case, you're not tuning performance; you're 
simply gathering information about your 
servers. 

Pay particular attention to disk I/O 
because this information will help with your 
eventual storage configuration by showing 
you what your current disk performance 
looks like. Poor disk configuration, whether 
from performance or space issues, is arguably 
the most common problem I see at nearly 
every client. If you aren’t already capturing 
performance information, start doing so—at 
all times of the day (e.g., busy, slow, week- 
ends). You need reliable numbers to make 
solid consolidation recommendations. 
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Applications 

One of the most important pieces of data 
you need in your consolidation arsenal is 
information about the actual applications. 
You might wonder why a DBA would 
care about applications—but you should 
care, because the applications largely dictate 
how you deploy the databases. You need to 
compile a list of all your applications, their 
owners, their service level agreements (SLAs), 
and their use in the environment. Two kinds 
of applications exist: those you buy and those 
you develop inhouse. Although some of 
the information you need about each type 
of application is different, many aspects are 
common to both types. 

For example, you need to know the 
application’s security model (Windows or 
SQL Server authentication) and whether 
combining the application’s databases will 
work with other application databases youre 
considering pairing the original application 
with. For example, suppose you have two 
applications that use SQL Server authenti- 
cation and have a login named Sally. In one 
application, Sally requires systems adminis- 
trator (sa) privileges in SQL Server, whereas 
in the other application, Sally just has read- 
only access to its database. Because Sally’s 
rights already include sa access, you'll most 
likely be escalating rights unnecessarily for 
Sally in the second application. This situation 
makes the two applications poor candidates 
for consolidating on the same SQL Server 
instance. 

You also need to know which version 
of SQL Server an application has been 
tested against. This consideration 1s especially 
important because many DBAs will want to 
upgrade to SQL Server 2005 as part of the 
consolidation process. Not all third-party 
vendors have certified their applications 
against SQL Server 2005—and even if they 
have, they might not have certified applica- 
tions against SQL Server 2005 Service Pack 
1 (SP1) or SP2. So if you determine that 
your standard will be SQL Server 2005 SP2 
and an application supports only the release 
to manufacturing (RTM) version, you either 
need to leave the application as is or deploy 
it in a consolidated manner using the RTM 
version because SQL Server supports mixed 
instances levels. If you don’t determine these 
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types of requirements during the assessment 
phase, by the time you start planning and 
deploying your consolidated environment, 
youll end up with an instance of SQL 
Server that’s incompatible with your appli- 
cation. Similarly, you probably haven't tested 
your inhouse applications yet against SQL 
Server 2005. If SQL Server 2005 will be the 
end state, assuming your applications will 
work is a big gamble. Even if your end state 
will be another version of SQL Server 2000 
(e.g., SQL Server 2000 SP4), the same rules 
apply. You need to work with the proper 
internal teams to ensure that everything will 
work after consolidation occurs. 

Finally, you need to be aware of each 
applications dependencies and make sure 
these dependencies are documented. A 
dependency can be anything from con- 
nection strings to other servers (possibly 
including everything in the chain up to your 
Active Directory—AD—-server) that the 
application needs to run. You especially need 
to take into account how every application 
server connects to the database server so 
that you can reconfigure each application to 
work after consolidation. Never assume that 
moving a database to another SQL Server 
machine is a transparent process; you might 
affect the availability of other applications 
and servers. You should identify these risks 
during your assessment so that you can care- 
fully schedule the database move when the 
consolidation is planned. 


Performing the Assessment 

If gathering the data sounds difficult, be 
aware that analysis can be even trickier. Ide- 
ally, analysis is performed by an experienced 
person (most likely, your most senior DBA 
or a consultant) who truly understands the 
environment, the applications, the business, 
and most importantly, SQL Server. Simply 
providing an inventory of SQL Server 
instances and crunching numbers based on 
performance metrics is inadequate and often 
employs faulty logic (e.g., just because Server 
A is at 10 percent CPU and Server B is at 
40 percent CPU doesn’t necessarily mean 
that you can combine the two servers). You 
must consider many factors that in some 
cases don’t correlate very well—for example, 
you can’t realistically compare a Pentium III 
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processor at 50 percent utilization to a new 
quad-core processor). A good analysis ulti- 
mately results in a document that outlines a 
proposed consolidated environment. 

Be careful that you don’t propose an unre- 
alistic consolidation that someone higher up 
will then expect you to deliver. Management 
typically wants to hear consolidation ratios 
such as 20:1 or 10:1. However, not every 
server, instance, and database can be consoli- 
dated; some might need to remain separate. 
Many third-party applications must be on 
dedicated servers or completely separated 
from other applications, which makes them 
tough candidates for consolidation. Other 
factors that dictate whether an application 
or database can be consolidated include 
SLAs, high performance requirements, and 
specialized requirements (e.g., split-mirror 
backups). I usually start by trying to help my 
customers cut their number of servers and 
instances in half, then whittle it down from 
there. Be prepared to defend your design 
against overzealous and idealistic managers. 

As you perform your assessment, keep 
in mind that although consolidation will 
yield fewer physical servers or SQL Server 
instances, the number of databases you're 
managing will most likely remain the same 
unless some of your databases are decommis- 
sioned in the process. In other words, your 
job as a DBA won't get any easier—and 
in the short term, until you adjust to your 
new environment, your job will probably 
be more difficult because what you were 
familiar (and comfortable) with no longer 
exists. You'll need to be more vigilant about 
administration and monitoring because 
aspects of SQL Server that you previously 
didn’t need to worry about affecting others 
will now have to share the same instance 
or server and most likely the same CPUs, 
memory, and disk I/O. For example, when 
on individual servers, your database backups 
might have all kicked off at midnight. In 
your consolidated environment, you might 
suddenly have multiple databases on the 
same server that all back up to the same 
drive simultaneously, causing oversatura- 
tion of your disk I/O. You'll need to adjust 
your administrative processes accordingly 
to avoid these conflicts. During the assess- 
ment phase, carefully consider the end state 
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and remember that you'll have to live with 
the consequences of your consolidation 
decisions. (As a side note, be aware that 
consolidation can lead to outsourcing; see 
the Web-exclusive sidebar “Outsourcing,” 

_http://wwwisqlmag.com, InstantDoc ID 
95467, for more information.) 

Part of your assessment should include 
an organizational readiness evaluation. For 
example, if youre proposing a new deploy- 
ment standard for SQL Server instances such 
as SQL Server 2005 SP2 on clusters for 
availability, you need to determine whether 
all of your applications work with the new 
standard. You also need to ensure that all of 
your DBAs are trained in the new standard. 
In addition, you need to determine how 
your administrative and operational proce- 
dures and processes will change, as well as 
whether your current administration and 
monitoring utilities will still work—or what 
kind of alteration will be necessary for them 
to work in your new environment. Finally, 
you need to determine whether you have 
the budget to make all the necessary changes 
that will ensure a seamless consolidation. 


At the End of the Day 


SQL Server consolidation is an increasingly 
popular trend in the constant struggle to 
achieve more with fewer resources. However, 
you must approach consolidation objectively. 
Consolidation for its own sake won't benefit 
your company. Management might push 
hard for consolidation, but implementing it 
incorrectly will do more harm than good. 
Although consolidation is ultimately imple- 
mented as a technology solution, it is, at its 
core, a business strategy. As a DBA, you must 
understand your business’ motivation for 
consolidation and work with, rather than 
against, those goals in your assessment and 
analysis. Only after you perform the analysis 
can you decide how to actually consolidate 
your environment. Taking the time to gather 
the proper information, performing the 
appropriate analysis, and making informed 
decisions leads to a consolidated SQL Server 
environment that can become a huge asset 
not only to your business but also to the 
DBAs who are responsible for managing the 
environment day-in and day-out. E 
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Product Review 


by John Green 


SonaSafe for 
SOL Server 


onasoft’s SonaSafe application software takes the heavy lifting out of Microsoft SQL 

Server and Exchange Server backup and recovery and database replication for standby 
servers, as well as file system backup and recovery. Although for this article, I focus on 
SonaSafe for SQL Server, SonaSafe works much the same way with Exchange servers 
and file servers. Created for backing up to disk, the SonaSafe Backup server supports local 
disk, SAN, and NAS for backup storage. SonaSafe for SQL Server supports SQL Server 
2005 or SQL Server Express, SQL Server 2000 (including Microsoft SQL Server Desktop 
Engine—MSDE), and SQL Server 7.0 and includes support for SQL Server clusters. 
SonaSafe installs on a system running Windows Server 2003, Windows XP Professional 
Edition, or Windows 2000 Server and requires Microsoft IIS. Sonasoft recommends a 


minimum of 1GB RAM and a 2GHz processor on the backup server. 


Architecture 

Note that although I use the term backup 
server to refer to the system you install the 
SonaSafe application on, understand that 
backup server also refers to standby servers. 
SonaSafe supports both one-to-many and 
many-to-one standby server database struc- 
tures. When restoring backups, SonaSafe 
supports both point-in-time and point- 
of-failure restores, in addition to full and 
redirected database restoration. 

The application installed on the backup 
server communicates with agents installed 
on the SQL Server machines it works with. 
Because SonaSafe is a Web-based applica- 
tion, the management console is acces- 
sible from most anywhere. Sonasoft says 
the SonaSafe agent uses about 15MB of 
memory and about 2 percent to 3 percent 
of the processor on target servers. The agent 
executes the jobs you define and manages 
the movement of data between a SQL 
server and the backup or standby server. 
Within the Web console, you tell SonaSafe 
about the SQL Servers systems it'll work 
with and the work you want it to do. 
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Sonasoft requires a license for each server 
and application that participates (e.g., a 
license for the SonaSafe backup server and 
a license for each SQL Server machine that’s 
a source, target, or standby server—multiple 
SQL instances are allowed). 


Hands On 

Installing SonaSafe wasn’t difficult. Sonasoft 
provides a preinstallation check routine to 
ensure that the server has the prerequisite 
software installed. Next, I installed the 
SonaSafe for SQL agent on several SQL 
Servers—nothing unexpected here. I made 
frequent use of the browser-based SonaSafe 
for SQL Server Users Guide, which was 
clearly written and logically organized. I 
later discovered one glitch, however, when 
none of the jobs I created would run: 
SonaSafe needs the NetBIOS form of all 
server names, and I had entered the full 
DNS form. 

The console interface is organized func- 
tionally. On the left, Backup, Recovery, and 
Standby menu options expand to expose 
related tasks. The Disaster Backup Plan 
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SERVER VERSION 3.5.9.6 

PROS: Simplifies creating, scheduling, 

and tracking SQL Server database backups; 
automates log shipping to maintain standby 


database servers; automates several scenarios 
of database recovery 


CONS: Somewhat clumsy Ul; lacks an ease-of- 
use I'd expect in a version 3 product 


RATING: WX 


PRICE: Pricing starts at $1995 for the 
application, $1295 for each standby server, 
and $995 for each SQL Server backup source 
or target. 


RECOMMENDATION: It's still a bit rough but 
can save a lot of time. Give it a test drive if you 
prefer not to create backup scripts manually. 


CONTACT: Sonasoft © 408 927 6200 e http:// 
_www,sonasoft.com 


demonstrates the simplicity of the tem- 
plate-based approach—fill in a few fields, 
and SonaSafe creates and schedules jobs 
to perform daily full database backups and 
hourly differential backups with transaction 
backups every minute for each database of 
the instance. 

SonaSafe for SQL Server is a great 
concept for a product. The current version, 
however, has a number of rough edges I 
found myself working around. Rough edges 
or not, SonaSafe’s ability to create, schedule, 
and track database backups will save you a 
lot of time. SonaSafe’s ability to select the 
correct set of backups for the task—often 
an arcane task done manually—can not only 
save you time, but also the consequences of 
restoring from the wrong set of backups. 
If that’s attractive to you, give SonaSafe for 
SQL Server an audition and see how it 
works for you. SOL 
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Product Review 


by Jason Picker 


Spotfire DKP 
Professional 1.1 


polfire DXP Professional 1.1 is a business intelligence tool with a set of rich visualiza- 
tions for quickly finding the information you need to make decisions. It is part of a suite 
of products that includes Spotfire DXP Enterprise Player and Spotfire Analytics Server. 


DXP Professional can pull data from 
many sources: for example, relational data- 
bases, text files, Excel files,and even SAS data 
files. However, DXP Professional cant pull 
data from Microsoft SQL Server Analysis 
Services (SSAS). The entire dataset is down- 
loaded to the client machine and cached in 
memory. All aggregations and calculations 
are performed locally as well. This can put 
some strain on the local machine’ resources. 
I recommend running DXP Professional 
on a machine with at least 1GB of RAM 
installed. 

You can choose among ten different 
visualizations in DXP Professional that 
make analyzing your data a breeze. The 
Cross Table visualization contains a feature 
called Continuous Color, which sets the fill 
color of each cell according to its relative 
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position within the range of values. In the 
Bar Chart, you can choose to “auto-bin” 
numeric values placed on the X-axis. Doing 
so creates bins or buckets to group the data. 
You can adjust the number of bins dynami- 
cally by using a slider bar. I used this feature 
with the Baseball dataset that’s included in 
the DXP Professional samples to discover 
that those players paid between 13 and 17 
million dollars in 2005 had fewer hits than 
all other salary groups but had the third 
highest number of home runs, as Figure 
1 shows. Scatter Plot lets you quickly and 
easily set the color, shape, and size of the 
data points by dragging a column to the 
appropriate section in the legend. You can 
select a data point or group of data points 
in any visualization and the application 
automatically highlights the same data in 
all the other visualiza- 
tions. The DETAILS- 
ON-DEMAND pane 
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> > FIGURE | A Bar Chart visualization 
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which is organized into 
pages. Each page can 
contain any number of 
visualizations and anno- 
tations, making it easy to 
create a dashboard-style 
interface. The pages can 
be represented as tabs or 
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SPOTFIRE DXP PROFESSIONAL I.1 


PROS: Easy-to-use visualizations; ability to 
combine data from multiple sources; supports 
offline analysis; intuitive filtering capability; 
smart annotation features 


CONS: Client machine does all the work; can’t 
pull data from Microsoft Analysis Services; 
limited to a single dataset per DXP file 


RATING: WX WIL 


PRICE: Starts at $3,000 for a 3-year 
renewable license; volume pricing is available 


RECOMMENDATION: If your users are tired 
of plain old reports, DXP Professional can help 
you quickly convert your data into information 
by using a rich set of visualizations. 


CONTACT: Spotfire e 800-245-4211 eè http:// 
www.spotfire.com 


as a series of numbered hyperlinks with Back 
and Next links for creating a guided work- 
flow, also called guided analysis. When saving 
the file, you can choose to link or embed 
the data. Annotations can also be added to 
the pages and can contain formatted text, 
images, or links. 

DXP Professional makes it easy to filter 
data. Filters are automatically created for each 
column in your data set. The application bases 
its determination of the type of filter on the 
data type of each column and the number of 
distinct values. 

If your company is using SSAS and 
you're in the market for a good business 
intelligence tool, you might want to look 
elsewhere. However, if you want a tool that 
can generate quick insight into the data you 
have sitting in a relational database or Excel 
spreadsheet, you should consider DXP 
Professional. 

Editors note: To read the full-length 
version of this review, go to http://www 
sqlmag.com and enter InstantDoc ID 
95537. Sou 
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Jason Picker (jason@solidqualitylearning.com) is a 
mentor with Solid Quality Learning and has more than 10 years 
of OLAP experience. 
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New Products 


WEB DEVELOPMENT 


Blake Eno (products@windowsitpra com) is product editor for Windows IT Pro and SQL Server Magazine. 


Efficiently Execute Your Web 
Development and Analysis Tasks 


lEWatch Software announced [EWatch Professional 4.5, a plug-in for Microsoft 
Internet Explorer (IE) that helps you more efficiently perform your Web development 
and analysis tasks. IEWatch Professional lets you debug HTTP and HTTP Secure 
(HTTPS) traffic and analyze cookies and POST and GET information. You can use the 


iewatch.com. 


DATABASE MONITORING 


Track Unauthorized 
Data-Access Attempts 


RippleTech announced Informant 2.0, 
a database- and application-monitoring 
solution that provides an audit trail for all 
database activity and real-time alerting. 
Informant can monitor logins, logouts, 
and failed logon attempts and alert on 
unauthorized access and access to specific 
objects. Informant now includes a Web- 
based administration console that’s remotely 
accessible. Other upgrades include role- 
based access to reports, secure management 
of audit logs, and centralized reporting across 
all tier-one database servers. For more infor- 
mation, contact RuippleTech at 866-739- 


8587 or http://www.rippletech.com. 


.NET APPLICATION 
MONITORING 

Real-Time Notification for 
Performance or Code 


AVIcode’s .NET application-monitoring 
solution, Intercept Studio 4.0, provides 
operational monitoring that simultaneously 
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software's filter and search capabilities to dissect the HTTP 
traffic related to Web analysis tracking code. The HTTP per- 
formance chart helps systems administrators analyze HTTP 
data and identify performance bottlenecks. Pricing for 
IE Watch Professional starts at $169 for a single-user license. For 
more information, contact [EWatch Software at http://www_ 


monitors and delivers real-time notifica- 
tion of performance and code problems. 
The software’s disconnected-monitoring 
capabilities enable monitoring and root- 
cause data collection on non-networked 
computers. Intercept Studio also collects 
and correlates key performance counters, 
provides computer-centric and application- 
centric views, and features on-the-fly con- 
figuration. Pricing for Intercept Studio is on 
a per-monitored computer basis and starts at 
$12,000 for a single monitoring console and 
server agent. For more information, contact 
AVIcode at 443-543-0030 or http://www 


DATABASE ADMINISTRATION 


AquaFold announced updates to its data- 
base query and administration tool, Aqua 
Data Studio 6.0, which supports key data- 
base platforms such as SQL Server, Oracle, 
IBM DB2, and MySQL. This release features 
new database-administration tools for SQL 
Server that let you view, modify, visualize, 
and maintain storage resources while you 
manage users, roles, profiles, and database 
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sessions. A new set of difference tools 
provides a quick look into the differences 
between two complete database schemas. 
You can view the differences between 
individual schema objects, files, directory 
structures, and SQL statements. Pricing for 
Aqua Data Studio starts at $399 for 1 to 
20 licenses. For more information, contact 


AquaFold at _http://www.aquafold.com. 


DATABASE AND 
DEVELOPMENT TOOLS 


Red Gate Provides Easy 
Access to All Its Tools 


Red Gate Software’ new service, SQL 
Professional Toolbelt, gives you access to all 
of Red Gate’s SQL Server tools—including 
SQL Compare Pro, SQL Packager, SQL 
Backup Pro, SQL Log Rescue, and SQL 
Dependency Tracker—through an annual 
subscription. The initial subscription, which 
offers 12 months of support and upgrades, 
costs $1995; subscription renewal is $995 per 
year. You can download a free trial of SQL 
Professional Toolbelt at Red Gate’s Web site. 
For more information, contact Red Gate 


Software at http://www.red-gate.com. 


MONITORING AND ALERTING 


Monitor the Status of Your 
SOL Server Infrastructure 


Pearl Knowledge Solutions announced 
SQLCentric 2.0.1, a Web-based monitoring 
and alerting solution for SQL Server 2005 
that features a new Web report module 
along with in-depth details and trend anal- 
ysis of the state of your SQL servers. With 
SQLCentric, you can monitor multiple 
instances for uptime, network connectivity, 
services status, cluster failover, memory 
status, job status, database status, blocked 
processes, and logged events. Pricing for 
SQLCentric starts at $995 for a 5-license 
kit. For more information, contact Pearl 
Knowledge Solutions at 917-499-7622 or 


http://www.pearlknows.com. [SOL] 
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Hear from Gartner Analyst John Enck about the latest trends in 

Windows and the *Nix platforms. Join industry experts Michael New York 
Otey, Darren Mar-Elia and Brian Komar for practical tips to man- 

age and secure your heterogenous environment. May 1 


TechX has something for everyone, regardless of whether you Washington, DC 
consider yourself a Windows person or a Linux/UNIX diehard. It's May 3 
the one event where the two camps can come together to find 


common. ground San Francisco 
\ sampling ill learn at Techx: May 8 

| Where to find and how to use new tools to help M Overview of today’s virtualization tech- 
you monitor your heterogenous environment nologies from hardware to applications and y 

Mi How to manage and secure your non-Window practical tips for using virtualization 
systems using Group Policy [Z] How virtualization can help you solve com- 

| How to break down the past communication bar- mon IT challenges such as server sprawl, f 
riers between Windows and *Nix systems deployment and testing 


[Z| How to secure and manage data and access 
management across heterogenous environments 


Hurry, seats are filling fast! Register today 
at www.windowsitpro.com/go/techx2007 


Windows IT Pro and Microsoft are bringing you a live, 
under-the-hood look at Longhorn this spring, featuring 
virtualization, web services and core reliability 
breakthroughs. You will get in-the-trenches insights 
from the smartest people in the business, and you 
will walk away knowing what new features and 
enhancements are included. Plus, you will receive 
practical advice on how you can maximize Longhorn 
in your environment. 
Lae Seen. eh Se NN T [ae ae a, ees i 
Space is limited, so register today at www.windowsitpro.com/go/longhorn 
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Insights from the SQL Server industry 


Industry Bytes 


The Year of Green Storage 
Ife work in a data center, you're 
probably familiar with green storage, 
environmentally friendly storage that uses 
less power and physical space than con- 
ventional technologies. Gartner predicts 
green storage will be the most hyped data- 
storage trend in 2007. I talked to Pillar 
Data Systems’ (http://www.pillardata 
.com) Senior Director of Marketing and 
Strategy Russ Kennedy and Director of 
Worldwide PR Chris Drago, to learn 
more about why it’s important for compa- 
nies to start thinking about green storage. 

Drago and Kennedy provided some 
astonishing statistics: In 2005, Gartner 
reported that US companies spent $6 bil- 
lion powering data centers. Gartner also 
predicts that by the end of 2008, nearly 50 
percent of data centers worldwide won't 
have sufficient power or cooling to support 
high-density equipment. 

Kennedy and Drago stressed that com- 
panies need to use their storage more 
efficiently and think about consolidating 
multiple storage platforms. “The prolif- 
eration of different boxes can’t continue,” 
warned Kennedy; Pillar Data Systems 
asserts that the limitation of physical space 
is one of the largest challenges facing IT 
pros today, especially those working in 
densely populated regions. Insofar as every 
company with a growing need for physical 
space (or Information Lifecycle Manage- 
ment—ILM—storage tiers) increases power 
consumption, the need for energy efficiency 
will drive the green storage movement. 

Pillar Data Systems’ goal is to make 
green storage as simple as possible and 
to help users understand what efficiency 
is all about. Attaining this goal means 
maximizing application performance and 
data capacity with the most efficient use 
of floor space and power consumption. 
To determine the efficiency of Pillar Data 
Systems’ storage products, the company 
uses an efficiency quotient (EQ), where 


On-Demand, Any-Level Training for SQL Server Users 


ust-in-Time (JIT) learning is always a tall order to fill. As the technology industry 

moves toward a business model that demands smaller workforces utilizing broader, 
cross-disciplinary skill sets, on-demand technical training is increasingly necessary to 
properly mobilize business. This trend is industry-wide, but it’s especially pronounced 
in the SQL Server arena. Anecdotal evidence gathered by SQL Server Magazine edi- 
tors concludes that many DBAs and network administrators using SQL Server are 
new to the database. Because of SQL Server's increasing back-end ubiquity, users 
who have no previous experience are suddenly staring at a SQL Server 2005 inter- 
face and being asked to make it work with anything from Microsoft Office Access 
to SharePoint technologies. 

Even when CFOs want to spend the necessary money to train these users, 
options are limited. “In the tech training industry, your delivery options are usually 
either a three-day seminar or a thin-content, e-learning solution,” counsels AppDev 
Products (http://www.appdev.com) President and CEO Craig Jensen. He says the 
problem with seminars is that they tend to be expensive and time-consuming, and 
that the information is “too much, too fast” to be retained. The problem with most 
e-solutions is that they're low-bandwidth, don’t cover information deeply enough, 
and have no reinforcement mechanisms. 

Jensen believes AppDev has developed an alternative that will let SQL Server 
users customize training programs to their specific needs, rather than choose a solu- 
tion according to its delivery mechanism. AppDev has produced an updateable series 
of training modules that can be delivered in a variety of formats, and customized to 
fit specific job descriptions and learning styles.The service package, called KSoutce, is 
a collection of high-bandwidth, rich-content video modules led by industry experts, 
and reinforced with interactive labs, pre- and post-module exams, and instructor-led 
coursework. AppDev offers KSource as both an updateable hardware appliance that 
can be installed behind the company firewall, as well as a broadband, high-bandwidth 
stream originating from its headquarters’ servers. Check out http://www.appdev 
.com/rs_entry.asp for a free trial run to judge the quality of the content yourself. 


—Sam Davenport 
InstantDoc ID 95071 


EQ = capacity (GB) X performance (I/O 
operations per second—IOPS) / power 
(watts) X space (square meters). When 
Pillar Data Systems compared its EQs with 
some of its competitors’ products’ EQs, it 
found that its products were often twice 
as efficient. The relatively high EQs of 
Pillars products stem from the company’s 
ability to consolidate multiple applica- 
tions and tiers of storage into single plat- 
forms. Additionally, the company is able to 
deliver multiple storage technologies such 
as Fibre Channel and Serial ATA (SATA) 
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from just one platform. Going forward, 
be sure to think about green storage, and 
talk with your vendors to help you deter- 
mine how efficient your storage really is. 

Is storage efficiency a concern in your 
organization? Does your company plan to 
address this concern in the next couple of 
years? Tell us your thoughts at our storage 
forum: http://forums.windowsitpro.com/ 
web/forum/messageview.aspx?catid=46&t 


hreadid=83847 &enterthread=y. SOL 


—Blake Eno 
InstantDoc ID 95192 
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Connecting readers with the 
products and services 
they need most. 
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A SPECIAL MONTHLY 


ADVERTISING SECTION 


If you would like your products 
and services featured among 
these pages, call: 


Key Accounts Director 
Richard Resnick 
800-949-4007 
rresnick@sqimag.com 


Additional advertising information is also available 
at www.sqimag.com 
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FIRST Advantage 


SQL DATABASE DISASTER RECOVERY 


Even a well configured, fault tolerant MSSQL Server can fail! 


» I/O errors 
+ "Suspect" mode 
» Deleted or corrupted log file 


» Deleted data (tables, records, system objects) 


» Corruption caused by RAID failure 
» Corrupted backup file 
+ Torn pages 


IT IS POSSIBLE TO RECOVER! 


In most cases it is possible to repair the database to an attachable state. If we are unable to 
repair the database to a point where it will attach, we will recover as many tables and 

records as possible with our specialized software tools. This recovered data can then be 
merged back into an empty database that your front end application will work with. 


Call us at 877.304.7189 or e-mail mssqi@datarecovery.net 


www .datarecovery.net 


“The ability to maintain 
clean, reliable marketing 
data across multiple capture 
mediums is essential for 
everything we do.” 


- Kevin Parker, Digital Evergreen 


Address Object API 


Verify, cleanse and format 
customer data at the point of 
entry or in batch. Easily i í 
integrate with .Net, MS SQL 
or Java. 


Woo’ Zew bS'MMM 


60085 


Get a demo: 
1-800-MELISSA 


MELISSAIDATA 


Your Data Superstore 


To download free evaluation copies, visit: 
www.teratrax.com or calf 1-800-370-5886 


Learn Microsoft® SQL Server" 2005 
and Business Intelligence (BI) 


Introducing the latest in SQL Server 2005 and Business Intelligence (BI) courses from AppDev, the nation’s leader in developer 
learning. Our nationally recognized industry experts will walk you step-by-step through the features and functionalities of these 
exciting SQL Server 2005 technologies! 
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Microsoft SQL Server 2005 20 CD-ROMs or 2 DVD-ROMs 


Microsoft SQL Server 2005 Reporting Services (SSRS) 8 CD-ROMs or 1 DVD-ROM 
Microsoft SQL Server 2005 Analysis Services (SSAS) 8 CD-ROMs or | DVD-ROM 


Microsoft SQL Server 2005 Integration Services (SSIS) 8 CD-ROMs or 1 DVD-ROM 


LOOZ Aew 


BUY 1, GET 1 


For a limited time, purchase one of our SOL Server 2005 courses above, get another SOL Server 2005 course 
(of equal or lesser value) FREE! Or get all four SOL Server 2005 courses in one money-saving learning suite! 


Visit our Web site today for offer details, plus course outlines and 
AppDev Expert Andy Baron more information about our new SQL Server 2005 courses. 


Bopp. ada? 
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Same great training, now for your entire team—KSource Online Learning™ e www.ksourceit.com 


by Michael Otey 


Virtualizing SOL Server 


irtualization is a rapidly changing technology that’s useful for server consolidation, testing 
and deployment, training, and disaster recovery. Stay current with this hot technology 
by reviewing these best practices for running SQL Server in a virtual environment. 


~ Say Yes to 64-bit Host 
Support. 

BothVMwares VMware Server and Micro- 
soft Virtual Server 2005 Release 2 (R2) 
support 64-bit architecture on the host 
side, which raises host memory capabilities 
to 1TB, enabling many more active virtual 
machines (VMs).VMware Server also sup- 
ports 64-bit guests. 


Allocate Enough Memory. 
Each VM needs enough of the host’ 
physical memory to support the guest 
server, plus 32MB per VM to cover VM 
overhead. For example, if you migrate 
a SQL Server system that requires 1GB 
(1024MB) of RAM, be sure to allocate 
1056MB (1024MB + 32MB) for the VM. 
Always leave RAM for the host—about 
512MB. 


Use a Virtual SCSI Controller. 


Although you can choose between using a 
virtual IDE controller and using a virtual 


THE EMPLOYEE OF 
THE MONTH IS TINA, 
FOR ALL OF THE UM... 
VARIOUS WORK THAT 

SHE DOES. 


® By Scott Adams 


SCSI controller, usually the virtual IDE 
controller is the default. However, you 
can get up to a 20 percent performance 
boost by using the virtual SCSI controller 
instead. 


Use Different Drives 

of a SAN. 
Create the VM’s virtual hard drive on a 
different physical drive from the host’s 
OS to reduce drive-spindle contention 
and improve the performance of SQL 
Server running in a VM. Better yet, use a 
SAN, which can increase I/O capabilities, 
improving VM performance. 


‘Preallocate Storage. 

Most virtual hard drives are configured 
with a default setting to dynamically 
expand as VM storage needs increase. This 
default setting saves storage space but slows 
performance. Instead, choose the preallocate 
storage setting (aka Fixed VHD) to avoid 
this performance hit. 


YOU HAVE STRIPPED 
THIS AWARD OF ITS 
MEANING BY SHOWING 
THAT YOU DON’T EVEN 
KNOW WHAT MY JOB IS. 


scottadams@aol.com 


www.dilbert.com 


~ Add Clustering. 


Important for server consolidation, clus- 
tering combines with virtualization to 
increase availability at either host of the 
VM guest levels. Virtual Server 2005 R2, 
VMware Server, and ESX Server sup- 
port VM guest clustering, in which each 
guest participates as a cluster node with 
nodes located on other hosts. In addition, 
Virtual Server 2005 R2 supports host clus- 
tering, in which the host can be the cluster 
node and—in the event of a failure— 
the host and all of its VMs can fail over 
to a backup and be automatically restarted. 


Use Multicore CPUs. 

Multicore CPUs from AMD and Intel 
support virtualization directly in the CPU. 
(AMD° virtualization support is always 
enabled; Intel’s must be enabled through a 
BIOS setting.) Virtualization products run 
one thread per VM in each separate core, 
and adding cores increases the performance 
of multipleVMs. Note that at the guest level, 
SQL Server is licensed by CPU, and virtual 
CPUs configured in the guest VM must be 
licensed. Sou 
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Unprecedented Visibility and Control over your Enterprise 


SQL Sentry Event Manager is the ultimate scheduling, alerting and response system for optimizing 
schedule performance of database servers and related IT resources. Event Manager provides DBAs 
with unparalleled capabilities for managing SQL Agent jobs, Windows Tasks, and Oracle jobs in 
increasingly complex cross-platform environments. 


Key Features: 
> Visual Schedule Management 
> Alerting and Response System 
> SSIS and DTS Support 
> Cross-platform Support 
> Chaining and Queuing 
> Schedule Performance Monitoring 


Free Trial Download at: www 


Key Benefits: 
> Easy to install and use 


> Distributed "Agent-less" deployment 
> 100% .NET based application 


> Lower database administration costs 
> Reduce down time 


> Improve application performance 


.sqisentry.net 
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TOOLS FOR DATA MANAGEMENT 


Microsoft 
GOLD CERTIFIED 
Partner 
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Superhero 


| STAND TALL. Idera made me a real life SOL Server Superhero. 


Idera delivers a new generation 

of tools for managing the 

world's fastest growing database 
management system — Microsoft 
SQL Server. Battle-proven and 
engineered for the enterprise, Idera 
helps database administrators keep 
SQL Server running at optimum 
performance, ensure availability, 


speed recovery, ease compliance 
requirements, and dramatically 
reduce administrative overhead. 

All of Idera’s products are amazingly 
simple to use, provide remarkable 
results, and can be installed in 
minutes, configured in hours, and 
deployed worldwide in days. 


